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Background: 

The  analysis  of  communication  equipment  characteristics  in  operation  settings  is  a  vital  component  of 
optimal  equipment  and  technology  selection.  For  the  US  Army  in  depth  analysis  of  the  equipment 
performance  and  efficiencies  are  vital  to  ensuring  that  these  technologies  work  as  expected  in  the  battlefield. 
In  depth  understanding  of  performance  characteristics  in  a  real  world  situation  has  to  be  understood  in  a 
context  where  actual  data  is  blended  with  simulation  data.  We  propose  that  this  problem  be  tackled  in  three 
phases.  Initially  this  involves  acquisition  of  the  data,  cleaning  and  normalization.  It  will  also  incorporate 
familiarization  with  the  parallel  environment  on  the  test  ARL  hardware  platform.  The  second  phase  of  this 
effort  involves  extraction  of  key  data  metrics  that  will  help  in  understanding  of  the  data,  the  relationships 
between  data  parameters  and  resolving  the  structural  and  semantic  conflicts  within  the  data.  The  third  phase 
of  this  project  will  build  on  the  first  two  phases  and  develop  techniques  that  will  be  used  to  analyze  and 
discover  latent  associations  within  the  data. 


Work  Accomplished: 

During  this  reporting  period,  we  have  expanded  on  the  approach  taken  in  Year  1  on  network 
characterization.  Using  the  network  characterization,  we  have  developed  methods  to  determine  network 
performance  and  rank  the  network,  directly  meeting  the  objectives  of  this  grant.  In  related  work,  we  have 
also  developed  measures  to  find  outliers  in  networks  using  the  notions  of  betweeness.  The  details  of  the 
work  performed  are  summarized  below: 

•  Net  Performance  Rank:  A  Comparison  Measure  to  Determine  Network  Performance 
Ranking 

Here  we  consider  the  basic  approach  to  the  problem  of  determining  the  quality  of  the  equipment 
used  in  communication.  The  network  tested  can  be  of  any  topology  and  the  while  the  formulations 
do  not  consider  physical  factors;  these  may  be  introduced  through  appropriate  weightage  of 
parameters.  This  problem  is  a  ranking  problem  in  Multi-Criteria  Decision  Making  (MCDM)  and 
the  approach  is  based  on  previous  work  performed  under  this  contract .  A  MCDM  problem  can  be 
expressed  in  matrix  format  as 
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Where  Am  are  possible  alternative  networks  among  which  have  to  rank,  ci>c2 are 
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criteria  with  which  alternative  performance  are  measured,  IJ  is  the  score  of  alternative  with 
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respect  to  criterion  J ,  J  is  the  weight  of  criterion  1  . 

While  various  attributes  may  be  selected  as  evaluating  criteria  we  focus  on  positive/negative 
frequent  events  which  may  occur  between  nodes  during  a  given  period  of  time,  and  effect  on 
network’s  performance.  For  instance,  the  retransmissions  or  other  failures  are  negative  events  that 
decrease  network’s  efficiency.  To  establish  the  decision  matrix,  the  follow  steps  needed:  1. Select 
the  collection  of  criteria,  and  2.  Scoring  networks  on  criteria. 

With  the  purpose  of  scoring  networks  on  criteria,  we  proposed  a  new  density  based  approach  which 
compute  global  probability  density  of  the  given  positive/negative  frequent  event  (criteria)  for  each 
networks.  For  this  purpose,  we  developed  a  novel  method  the  “ Correlation  Density  Rank"  which 
finds  probability  density  distribution  of  related  frequent  event  on  all  nodes,  and  then  we  aggregate 
these  densities  on  whole  network  using  the  Renyi  entropy  as  the  score  of  network  performance  on 
related  criterion.  The  Topsis  method  is  then  used  to  calculate  the  performance  rank  of  the  network. 

•  Discovering  Community  Structure  in  Dynamic  Networks 

Recent  studies  have  supplied  favorable  results  regarding  to  exploring  communities  within  a 
dynamic  networks,  a  major  problem  in  the  data  mining  area.  A  correct  community  is  usually  defined 
as  a  subgraph  with  a  higher  internal  density  and  a  lower  crossing  density  with  others  subgraphs. 
Various  density-based  techniques  have  been  devoted  to  uncovering  community  structures  in  social 
networks.  In  this  research  effort,  a  novel  distance  based  ranking  algorithm,  which  is  called 
“Correlation  Density  Rank”,  is  developed  to  derive  the  community  tree  from  the  network.  As  in 
the  real  world,  where  a  network  is  constantly  evolving,  we  demonstrate  a  tree  learning  algorithm, 
which  employs  edit  distance  as  the  scoring  function,  to  derive  an  evolving  community  tree  that 
allows  a  smooth  alteration  between  two  community  trees.  We  also  string  communities  to  obtain  an 
evolution  graph  of  the  organizational  structure,  by  which  we  can  achieve  new  perceptions  from  the 


dynamic  network.  The  experiments,  conducted  on  a  synthetic  graph  and  the  real-world  network 
dataset  provided  by  ARL  demonstrate  the  feasibility  and  applicability  of  the  framework. 

•  Outlier  Detection  in  Network  Data  using  the  Betweenness  Centrality 

Outlier  detection  has  been  used  to  detect  and,  where  appropriate,  remove  anomalous  observations 
from  data.  It  has  important  applications  in  the  field  of  fraud  detection,  network  robustness  analysis, 
and  intrusion  detection.  In  this  approach,  we  propose  Betweenness  Centrality  as  a  technique  to 
determine  the  outlier  in  network  analyses.  The  Betweenness  Centrality  of  a  vertex  in  a  graph  is  a 
measure  for  the  participation  of  the  vertex  in  the  shortest  paths  in  the  graph.  This  measure  is  widely 
used  in  network  analyses  where  the  recursive  computation  of  the  betweenness  centralities  of 
vertices  is  used  to  for  community  detection.  We  show  the  effectiveness  of  using  this  method  to 
detect  outliers  in  network  data. 

•  A  Smart  Assignment  Technique  with  Consideration  of  Multicriteria  Reciprocal  Judgments 

To  date  the  assignment  problems  are  important  tasks  in  recommender  systems  and  one-to-one 
matching  issues  through  social  environments.  The  various  approaches  have  been  proposed  to  reach 
these  purposes  that  are  normally  limited  to  the  considerations  of  cost  or  profit  incurred  by  each 
possible  assignment.  However  most  of  the  time,  each  of  the  alternatives  at  both  assignment  sides 
have  particular  criteria  forjudging  about  the  other  side  alternatives,  whereby  they  can  evaluate  their 
sufficiency.  In  this  paper,  in  order  to  obtain  the  optimality  of  both  dimensions  of  assignment  we  try 
to  consider  the  concept  of  efficiency  rather  than  the  cost  or  profit  of  each  possible  assignment. 
Therefore,  the  efficient  assignment  is  the  one  that  firstly,  has  the  maximum  optimality  in  terms  of 
both  dimensions  of  assignment,  and  secondly,  takes  into  account  the  significance  of  judgment  of 
each  assignment  from  the  viewpoint  of  decision  maker.  To  do  this,  a  compound  index  would  be 
defined  which  includes  the  efficiency  related  to  twodimensional  optimized  assignment  for  the 
purpose  of  measuring  the  performance  of  each  possible  assignment.  Next,  A  mathematical 
programming  model  for  the  extended  assignment  problem  is  proposed,  which  is  then  expressed  as 
a  classical  integer  linear  programming  model  to  determine  the  assignments  with  the  maximum 
efficiency.  A  numerical  example  is  used  to  demonstrate  the  approach. 
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Outlier  Detection  in  Network  Data  using  the 
Betweenness  centrality 


H.  B.  Mihiri  Shashikala,  Roy  George,  Khalil  A.  Shujaee 
Department  of  Computer  and  Information  Science 
Clark  Atlanta  University 
Atlanta,  GA  30314 
hewa.shashikala@students.cau.edu 


Abstract —  Outlier  detection  has  been  used  to  detect  and, 
where  appropriate,  remove  anomalous  observations  from  data.  It 
has  important  applications  in  the  field  of  fraud  detection, 
network  robustness  analysis,  and  intrusion  detection.  In  this 
paper,  we  propose  a  Betweenness  Centrality’  (BEC)  as  novel  to 
determine  the  outlier  in  network  analyses.  The  Betweenness 
Centrality'  of  a  vertex  in  a  graph  is  a  measure  for  the 
participation  of  the  vertex  in  the  shortest  paths  in  the  graph.  The 
Betweenness  centrality  is  widely  used  in  network  analyses. 
Especially  in  a  social  network,  the  recursive  computation  of  the 
betweenness  centralities  of  vertices  Is  performed  for  the 
community  detection  and  finding  the  influential  user  in  the 
network.  In  this  paper,  we  propose  that  this  method  Ls  efficient  in 
finding  outlier  in  social  network  analyses.  Furthermore  we  show 
the  effectiveness  of  the  new  methods  using  the  experiments  data. 

Keywords — outlier  detection;  network  data;  betweenness 
centrality,  adjecncy  matrix. 

I.  Introduction 

Outlier  detection  is  an  important  data  mining  task  that  is 
focused  on  the  discovery  of  objects  that  are  exceptional  when 
compared  with  a  set  of  observations  that  are  considered  typical. 
In  many  data  analysis  tasks,  a  large  number  of  variables  are 
being  recorded  or  sampled.  One  of  the  first  steps  towards 
obtaining  a  coherent  analysis  is  the  detection  of  outlaying 
observations.  Although  outliers  are  often  considered  as  an  error 
or  noise,  they  may  carry  important  information.  Detected 
outliers  are  candidates  for  aberrant  data  that  may  otherwise 
adversely  lead  to  model  misspecification,  biased  parameter 
estimation  and  incorrect  results.  These  objects  are  important 
since  they  often  lead  to  the  discovery  of  exceptional  events. 
Substantial  research  has  been  done  in  outlier  detection  and 
these  are  classified  into  different  types  with  respect  to  the 
detection  approach  being  used.  Exemplar  techniques  include 
Classification  based  methods.  Nearest  Neighbor  based 
methods.  Cluster  based  methods  and  Statistical  based  methods 
[19],  In  the  Classification-based  approach  [31],  [32]  a  model  is 
created  from  a  set  of  labeled  data  points  and  then  a  test  point  is 
classified  into  one  of  the  classes  using  appropriate  testing. 
Support  Vector  Machine  (SVM)  based  methods  [30],  methods 
based  on  Neural  Networks  [33]  and  Bayesian  Networks  based 
methods  [25],[28],[34]  belong  to  Classification  based 
technique.  The  testing  phase  of  this  method  is  considerably  fast 
as  each  test  data  is  compared  against  the  pre-built  model.  The 


accuracy  of  classification  based  methods  rely  on  the 
availability  of  accurate  pre  classified  examples  for  different 
normal  classes,  which  is  rarely  found.  Nearest  Neighbor  based 
methods  [27],  [29],  [35]  involve  distance  or  similarity 
measures  which  is  defined  between  data  points.  In  this  paper, 
we  discuss  a  new  method  to  find  out  an  outlier  that  is  based  on 
a  graph.  This  method  efficiently  reduces  the  search  space  by 
finding  a  candidate  set  of  vertices  whose  betweenness 
centralities  can  be  computed  using  candidate  vertices  only. 

The  Betweenness  Centrality  (BEC)  is  a  measure  that 
computes  the  relative  importance  of  a  vertex  in  a  graph,  and  it 
is  widely  used  in  network  analyses  such  as  social  network 
analysis,  biological  graph  analysis,  and  road  network  analysis 
[1],  In  the  social  network  analysis,  a  vertex  with  higher 
centrality  can  be  viewed  as  a  more  important  vertex  than  a 
vertex  with  lower  centrality.  The  BEC  of  a  vertex  in  a  graph  is 
a  measure  used  for  the  participation  of  the  vertex  in  the  shortest 
paths  in  the  graph.  There  are  many  previous  works  on  the  BEC 
problem.  The  concept  of  the  BEC  is  proposed  in  [35],  but  the 
definition  proposed  in  [40]  is  more  widely  used.  Recently, 
many  variants  of  the  definition  are  proposed  in  [38],  [37] 
improves  the  computation  time  of  the  BEC  based  on  a 
modified  breadth-first  search  algorithm  and  the  dependency  of 
a  vertex,  and  it  is  the  fastest  known  algorithm  that  computes 
the  exact  BEC  of  all  the  vertices  in  a  graph.  The  computations 
of  the  shortest  paths  between  all  pairs  of  vertices  are  time 
consuming.  Therefore,  another  definition  of  BEC  is  proposed 
[22];  this  based  on  a  random  walk.  In  [42],  each  vertex  has  a 
probability  of  visiting  its  neighbor  vertices.  Also,  [39],  [36]  and 
[41]  propose  approximation  algorithms  for  computing  the 
betweenness  centrality.  [43]  and  [44]  adopt  the  betweenness 
centrality  for  detecting  communities  in  a  social  network. 

Although  many  methods  currently  exist  on  calculating  the 
BEC  and  the  BEC  is  one  of  the  major  methods  used  in 
analyzing  social  network  graphs,  none  of  the  existing  methods 
address  the  problem  of  updating  BEC.  In  this  paper  we  propose 
the  betweenness  centrality  to  find  out  outliers  for  network  type 
data. 

The  next  section  of  this  paper  describes  related  terms 
and  definitions  which  are  used  throughout  the  paper. 
Furthermore,  it  outlines  the  approach  that  explains  the 
algorithm  behind  the  BEC  approach.  To  get  a  better 
understanding  and  to  demonstrate  the  accuracy  of  BEC,  several 


experiments  were  conducted  with  different  kinds  of  synthetic 
data  sets  which  are  described  in  detail  in  the  experimental 
results  section.  We  apply  BEC  technique  to  find  outliers  in 
synthetic  data  sets  and  compare  it  with  another  an  alternate 
technique  the  modified-Shared  Nearest  Neighbor[3].  Finally 
we  conclude  the  paper  with  a  discussion  of  the  performance, 
accuracy  and  the  importance  of  the  proposed  technique.  From 
the  results  of  experiments,  it  is  clear  that  this  technique  gives 
better  results  in  comparison  to  the  modified-Shared  Nearest 
Neighbor  by  giving  higher  true  positive  and  tme  negative 
values  and  very  low  false  positive  and  false  negative  values  for 
network  type  data. 

The  m-SNN  (modified-Shared  Nearest  Neighbor)  method 
[3]  is  based  on  the  non-parametric  clustering  algorithm,  the 
Shared  Nearest  Neighbor  (SNN)  Approach  developed  by  ErtOz 
et  al.  [9],  This  method,  we  consider  the  ratio  between  the 
summation  of  Euclidean  distances  to  shared  nearest  neighbors 
and  their  total  number  of  shared  neighbors.  To  differentiate 
between  outliers  and  normal  nodes,  hypothesis  testing  is  used, 
Babara  et  al  [18]  and  Rogers  [4]. 


II.  Terms  and  Definitions 
Betweenness  Centrality 

A  measure  that  computes  the  relative  importance  of  a  vertex  in 
a  graph.  The  formal  definition  is  presented  below. 

A  graph  is  represented  by  G  =  (V,E) ,  where  V  is  the  set 
of  vertices,  and  EqVxV  is  the  set  of  edges.  A  path  in  a 
graph  is  represented  by  a  sequence  of  vertices, 

(v,,...,vn)  where  vitVj  eV  fori  <  i,j  <n,  i*  j,  except 
possible  1  =  n  . 

Definition  1  (Betweenness  Centrality).  The  betweenness 
centrality  of  a  vertex  V .  6  G  is: 


c(v;)=£ 
i,  k 


gv,JV>> 

<WD 


Where, v. ,v.,v(ef,  i^j^k,  cr  ( v . )  is  the  number 
of  shortest  paths  between  V,  and  Vt  that  include  Vj ,  and 
<Tv  is  the  number  of  shortest  paths  between  v,  and  vt .  The 
betweenness  centrality  can  be  computed  as  follows: 

1  For  each  pair  of  vertices  ( V3  and  v, ),  compute  the  shortest 
paths  between  the  two  vertices. 

2.  For  each  pair  of  vertices,  compute  the  ratio  of  each  vertex 
participating  in  the  shortest  path(s).  The  ratio  is  the  number  of 
shortest  paths  between  V3  and  v(  that  go  through  v  }  divided 

by  the  number  of  shortest  paths  between  V3  and  v, . 

3.  Accumulate  the  ratio  for  all  pairs  of  vertices. 


Definition  2  (Adjacency  Matrices). 

The  adjacency  matrix  of  a  finite  graph  G  on  n  vertices  is  the  n 
x  n  matrix  where  the  non-diagonal  entry  atJ  is  the  number  of 
edges  from  vertex  i  to  vertex  j ,  and  the  diagonal  entry  a„, 
depending  on  the  convention,  is  either  once  or  twice  the 
number  of  edges  (loops)  from  vertex  i  to  itself.  Undirected 
graphs  often  use  the  latter  convention  of  counting  loops  twice, 
whereas  directed  graphs  typically  use  the  former  convention. 


Figure  1:  Shortest  paths  through  nodes  in  destination  IP 
addresses. 


Figure  2:Undirected  graph  with  adjacency  matrix. 

Figure  2  shows  the  adjacency  matrix  for  undirected  graph. 
A,  B,  C,  D,  E,  and  F  represent  the  nodes.  In  the  diagonal,  all 
values  are  zero  and  if  two  nodes  are  connected,  the  matrix  is 
denoted  by  the  value  of  1 . 


III.  Approach 
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Figure  3:  The  resulting  adjacency  matrix  including  Id  numbers 
in  the  first  row  and  first  column. 


This  outlier  detection  method  is  based  on  BEC  for  network 
data  and  p-value  technique  of  hypothesis  testing  for  finding 
outliers.  For  each  data  point,  we  calculate  its  BEC  by  using 
adjacency  matrix  for  network  data.  To  find  out  the  adjacency 
matrix  for  the  data  set,  we  calculate  the  shortest  paths  through 
nodes  in  the  destination  IP.  Figure  1  show  the  shortest  path 


through  nodes  in  destination  IP  address.  The  numbers  represent 
the  label  of  each  node  for  the  given  data  points.  The  shortest 
path  that  is  calculated  creates  an  adjacency  matrix  from  it  by 
utilizing  sparse  matrices  in  order  to  increase  computational 
speed.  Our  calculation  is  based  on  undirected  network  type 
data.  The  calculation  for  adjacency  matrix  yields  an  adjacency 
matrix  from  friendship  nominations  stored  as  a  sparse  matrix. 
The  resulting  adjacency  matrix  will  include  Id  numbers  in  the 
first  row  and  first  column;  it  is  shown  in  figure  3.  To  find  the 
BEC,  the  calculation  of  the  influence  domain  of  each  node  in  a 
given  adjacency  matrix  for  a  given  step,  returns  the  undirected 
BEC  for  each  node  of  undirected  adjacency  matrix  'adj'. 
Matrix  'adj'  must  be  an  undirected  network  and  may  or  may 
not  be  sparse.  The  matrix  is  simple  to  change  if  the  graph  is 
directed.'adj'  is  assumed  to  have  id  numbers  in  the  first  row 
and  also,  this  code  could  probably  be  more  vectorised  to  speed 
up  calculations  for  large  adjacency  matrices. 


As  our  method  needs  to  find  the  adjacency  matrix  for  each 
data  point,  it  is  required  to  calculate  the  shortest  path  between 
each  other  data  points.  Since  we  have  n  data  points,  the 
complexity  of  calculating  the  shortest  path  is  0(n2).  Finally  to 
find  outliers  we  need  to  compare  each  data  point  with  the 
other  data  points,  thus  resulting  in  0(n2)  complexity. 

IV.  Experimental  Results 

This  section  describes  the  experiments  and  the  results  with 
synthetic  data  sets  followed  by  how  the  data  was  generated. 
The  experiment  was  run  where  t  was  taken  as  0.05.  i.e.,  these 
experimental  results  have  95%  confidence. 


A.  Synthetic  Data 

To  cover  the  broad  range  of  applications,  network  type  data 
sets  were  generated.  We  apply  a  rigorous  set  of  tests  to  the  data 
in  the  path  to  understand  the  strength  or  weakness  of  the 
method.  In  all  cases  we  use  probabilistic  distribution  based  data 
generation  which  takes  user  inputs  to  decide  parameters  of  the 
data  pattern,  i.e.,  identify  variables  and  then  use  a  probabilistic 
model  to  generate  the  required  number  of  data  points  and 
outliers. 

After  generating  data,  each  set  of  data  points  with  scaling 
features  were  tested  by  using  both  the  BEC  method  and  m- 
SNN  [3]  outlier  detection  method.  The  m-SNN  method  is  a 
modification  of  the  SNN  (Shared  Nearest  Neighbor)  method 
that  aids  in  outlier  detection. 

In  this  analysis,  we  generated  network  data  sets  of  three 
different  sizes  viz.  small  (100<),  medium  (100<  medium< 
1000)  and  large(1000>).  An  example  for  a  small  data  set  is  a 
set  with  56  total  data  points,  where  6  of  them  were  generated  as 
global  outliers  which  is  small  data  set.  After  applying  our  new 
BEC  method  and  m-SNN  method  with  x  0.05,  all  the  expected 
global  outliers  were  detected  for  the  BEC  method.  Though  the 
m-SNN  approach  was  able  to  detect  all  the  above  labeled 


outliers  correctly  too,  the  results  were  not  as  accurate  or  precise 
as  the  BEC  method. 

The  results  obtained  are  summarized  in  Table  II  to  demonstrate 
True  Positive  (TP),  False  Positive  (FP),  True  Negative  (TN) 
and  False  Negative  (FN)  values  as  percentages.  It  shows  the 
average  results  for  three  different  sizes  of  data  sets.  From  the 
results,  it  is  clear  that  the  BEC  has  very  high  TP,TN 
percentages  and  very  low  FP,FN  percentages  compared  to  the 
m-SNN  approach.  Also  the  proposed  method  has  the  best 
results  for  the  network  type  data.  On  comparing  the  results  of 
complex  path  data  sets,  it  is  evident  that  the  BEC  is  more 
robust  in  finding  outliers  (compared  to  m-SNN)  particularly 
with  respect  to  true  positives  and  minimizing  false  negatives. 

Procedure:  Betweenness  centrality  Based  Outlier 
Detection 

Inputs:  data[],  a  set  of  network  data  points; 

Output:  List  of  Outliers 

//  Finding  Adjecency  matrix  for  all  the  data  points 

Inputs:  data[],  Adjacency  matrix  for  data  points; 

Output:  List  of  Betweenness  centrality  for  all  data 
points 

II  Finding  Betweenness  centrality  for  all  the  data 
points 

Inputs:  data[].  Betweenness  Centrality  for  data 
points; 

Output:  List  of  Outliers 

//Finding  the  outliers  based  on  p-value  method 


Table  1:  Betweenness  centrality  Based  Outlier  Detection  Algorithm 


TP(%) 

FP(%) 

TN(V.) 

FN(%) 

BEC 

100.0 

0.5 

99.2 

0.2 

m-SNN 

100.0 

3.5 

96.5 

2.3 

Table  2:  Experimental  results  for  BEC  and  m-SNN. 

I.  Conclusions 

In  this  paper,  we  have  described  an  algorithm  based  on  graph 
theory  capable  of  detecting  outliers  in  different  types  of 
network  type  data  sets.  This  method  is  a  combination  of 
adjacency  matrix  and  betweeness  centralities  which  avoids 
assumptions  about  data  distributions  and  uses  hypothesis 
testing  to  detect  outliers.  Through  a  series  of  experiments,  we 
have  shown  that  this  method  achieves  good  results  with  very 
high  true  positive  and  true  negative  values  with  the  BEC 
approach  producing  outlier  detection  results  equivalent  or 
better  than  the  m-SNN  method.  Furthermore,  modifying  this 
method  can  be  used  to  identify  an  outlier  to  update  a  social 
network  graph.  Currently  we  are  reformulating  the  algorithm  to 


improve  the  run  time  efficiencies  and  also  to  parallelize  the 
code  to  make  it  amenable  for  massive  data  sets. 
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Abstract —  PageRank  is  a  well-known  algorithm  that  has  been 
used  to  understand  the  structure  of  the  Web.  In  its  classical 
formulation  the  algorithm  considers  only  forward  looking 
paths  in  its  analysis-  a  typical  web  scenario.  We  propose  a 
generalization  of  the  PageRank  algorithm  based  on  both  out- 
links  and  in-links.  This  generalization  enables  the  elimination 
network  anomalies-  and  increases  the  applicability  of  the 
algorithm  to  an  array  of  new  applications  in  networked  data. 
Through  experimental  results  we  illustrate  that  the  proposed 
generalized  PageRank  minimizes  the  effect  of  network 
anomalies,  and  results  in  more  realistic  representation  of  the 
network. 

Keywords-  Search  Engine;  PageRank;  Web  Structure;  Web 
Mining;  Spider-Trap;  dead-end;  Taxation;Web  spamming. 

I.  Introduction 

With  the  rapid  growth  of  the  Web,  users  can  get  easily 
lost  in  the  massive,  dynamic  and  mostly  unstructured 
network  topology.  Finding  users’  needs  and  providing  useful 
information  are  the  primary  goals  of  website  owners.  Web 
structure  mining  [1],[2].[3]  is  an  approach  used  to  categorize 
users  and  pages.  It  does  so  by  analyzing  the  users’  patterns  of 
behavior,  the  content  of  the  pages,  and  the  order  of  the 
Uniform  Resource  Locator  (URL)  that  tend  to  be  accessed. 
In  particular,  Web  structure  mining  plays  an  important  role 
in  guiding  the  users  through  the  maze.  The  pages  and 
hyperlinks  of  the  World-Wide  Web  may  be  viewed  as  nodes 
and  arcs  in  a  directed  graph.  The  problem  is  that  this  graph  is 
massive,  with  more  than  a  trillion  nodes,  several  billion 
links,  and  growing  exponentially  with  time.  A  classical 
approach  used  to  characterize  the  structure  of  the  Web  graph 
through  PageRank  algorithm,  which  is  the  method  of  finding 
page  importance. 

The  original  PageRank  algorithm  [3),[4],[5]  one  of  the 
most  widely  used  structuring  algorithms,  states  that  a  page 
has  a  high  rank  if  the  sum  of  the  ranks  of  its  backlinks  is 
high.  Google  effectively  applied  the  PageRank  algorithm,  to 
the  Google  search  engine  [4].  Xing  and  Ghorbani  [6] 
enhanced  the  basic  algorithm  through  a  Weighted  PageRank 
(WPR)  algorithm,  which  assigns  a  larger  rank  values  to  the 
more  important  pages  rather  than  dividing  the  rank  value  of  a 
page  evenly  among  its  outgoing  linked  pages.  Each  outgoing 
link  page  gets  a  value  proportional  to  its  popularity  (its 
number  of  in-links  and  out-links).  Kleinberg  [7]  identifies 
two  different  forms  of  Web  pages  called  hubs  and 
authorities,  which  lead  to  the  definition  of  an  iterative 


algorithm  called  Hyperlink  Induced  Topic  Search  (HITS) 

[8], 

Bidoki  and  Yazdani  [9]  proposed  a  novel  recursive 
method  based  on  reinforcement  learning  [10]  that  considers 
distance  between  pages  as  punishment,  called 
“DistanceRank”  to  compute  ranks  of  web  pages  in  which  the 
algorithm  is  less  sensitive  to  the  “rich-get-richer"  problem 
[9].[  llj  and  finds  important  pages  faster  than  others.  The 
DirichletRank  algorithm  has  been  proposed  by  X.  Wang  et  al 
[12]  to  eliminate  the  zero-one  gap  problem  found  in  the 
PageRank  algorithm  proposed  by  Brin  and  Page  [4].  The 
zero-one  gap  problem  occurs  due  to  the  ad  hoc  way  of 
computing  transition  probabilities.  They  have  also  proved 
that  this  algorithm  is  more  robust  against  several  common 
link  spams  and  is  more  stable  under  link  perturbations.  Singh 
and  Kumar  [13]  provide  a  review  and  comparison  of 
important  PageRank  based  algorithms. 

As  search  engines  are  used  to  find  the  way  around  the 
Web,  there  is  an  opportunity  to  fool  search  engines  into 
leading  people  to  particular  page.  This  is  the  problem  of  web 
spamming  [14],  which  is  a  method  to  maliciously  induce 
bias  to  search  engines  so  that  certain  target  pages  will  be 
ranked  much  higher  than  they  deserve.  This  leads  to  poor 
quality  of  search  results  and  in  tum  reduces  the  trust  in  the 
search  engine.  Consequently,  anti-spamming  is  a  big 
challenge  for  all  the  search  engines.  Earlier  Web  spamming 
was  done  by  adding  a  variety  of  query  keywords  on  page 
contents  regardless  of  their  relevance.  In  link  spamming  [15], 
the  spammers  intentionally  set  up  link  structures,  involving  a 
lot  of  interconnected  pages  to  boost  the  PageRank  scores  of  a 
small  number  of  target  pages.  This  link  spamming  does  not 
only  increasing  the  rank  gains,  but  also  makes  it  harder  to 
detect  by  the  search  engines.  It  is  important  to  point  out  that 
link  spamming  is  a  special  case  of  the  spider-traps  [16].  At 
the  present  time,  the  Taxation  method  [16]  is  the  most 
significant  way  to  diminish  the  influence  of  the  spider-traps 
and  dead-ends  by  teleporting  the  random  surfer  to  a  random 
page  in  each  iteration. 

This  article  has  two  main  contributions:  First,  we  present 
a  generalized  formulation  of  the  PageRank  algorithm  based 
on  transition  probabilities,  which  takes  both  in-link  and  out- 
links  of  node  and  their  influence  rates  into  account  in  order 
to  calculate  PageRanks.  This  would  permit  the  application  of 
this  approach  to  a  wide  variety  of  network  problems  that 
require  consideration  of  the  current  state  values  (and 
PageRank)  as  a  function  of  past  state  transitions.  Second,  we 
describe  a  novel  approach  of  adding  virtual  edges  to  a  graph 
that  permits  more  realistic  computations  of  PageRank, 
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negating  the  effect  of  network  anomalies  such  as  spider-traps 
and  dead-ends. 

The  paper  is  organized  as  follows.  In  Section  2,  a  brief 
background  review  of  the  basic  concepts  for  computing 
PageRanks  based  on  transition  probabilities  is  presented  and 
the  problems  related  to  network  anomalies  such  as  spider- 
traps  and  dead-ends  together  with  their  solution  method 
based  on  Taxation  is  stated.  In  Section  3,  we  introduce  the 
proposed  general  approach  for  determining  PageRank.  In 
Section  4,  we  apply  our  PageRank  method  to  a  typical  graph 
with  all  types  of  possible  structures  and  inter/  intra- 
correlations  and  compare  our  results  with  the  baseline 
technique.  In  Section  5,  we  conclude  by  describing  the 
contribution  of  our  method  and  discuss  its  results. 

II.  Overview  on  the  PageRank  approach  based  on 

TRANSITION  PROBABILITIES 

PageRank  is  a  function  that  assigns  a  real  number  to  each 
page  in  the  Web.  We  begin  by  defining  the  basic,  idealized 
PageRank,  and  follow  it  by  modifications  that  are  necessary 
for  dealing  with  some  real-world  problems  concerning  the 
structure  of  the  Web.  Imagine  surfing  the  Web,  going  from 
page  to  page  by  randomly  (random  surfer)  choosing  an 
outgoing  link  from  one  page  to  get  to  the  next.  This  can  lead 
to  dead-ends  at  pages  with  no  outgoing  links,  or  cycles 
around  cliques  of  interconnected  pages.  This  theoretical 
random  walk  is  known  as  a  Markov  chain  or  Markov  process 
116], [17]. 

In  general,  we  can  define  the  transition  matrix  of  the  Web 
to  describe  what  happens  to  random  surfers  after  one  step. 
This  matrix  M  has  n  rows  and  columns,  if  there  are  n  pages. 
The  element  mtj  in  row  /  and  column  j  has  value  1/k  if  page  j 
has  k  arcs  out,  and  one  of  them  is  to  page  i.  Otherwise, 
miJ  =  0 .  The  probability  distribution  for  the  location  of  a 

random  surfer  can  be  described  by  a  column  vector  whose 
y'th  component  is  the  probability  that  the  surfer  is  at  page  j. 
This  probability  is  the  (idealized)  PageRank  function. 

Suppose  we  start  a  random  surfer  at  any  of  the  n  pages  of 
the  Web  with  equal  probability.  Then  the  initial  vector  v0 
will  have  1/n  for  each  component.  If  M  is  the  transition 
matrix  of  the  Web,  then  after  one  step,  the  probability 
distribution  of  the  surfer  place  will  be  A/t’0,  after  two  steps 

it  will  become  M  (Mv0)  =  M  V0 ,  and  so  on.  In  general, 
multiplying  the  initial  vector  v0  by  M  a  total  of  i  times  will 
give  us  the  distribution  of  the  surfer  after  i  steps. 

This  sort  of  behavior  is  an  example  of  a  Markov 
processes.  It  is  known  that  the  distribution  of  the  surfer 
approaches  a  limiting  distribution  v  that  satisfies  v  =  Mv  , 
provided  two  conditions  are  met: 

1)  The  graph  is  strongly  connected;  that  is,  it  is  possible 
to  get  from  any  node  to  any  other  node. 

2)  There  are  no  dead-ends:  nodes  that  have  no  arcs  out. 

In  fact,  because  M  is  stochastic,  meaning  that  each  of  its 

columns  adds  up  to  1,  v  is  the  principal  eigenvector.  Note 
also  that,  because  M  is  stochastic,  the  eigenvalue  associated 
with  the  principal  eigenvector  is  l.The  principal  eigenvector 


of  M  tells  us  where  the  surfer  is  most  likely  to  be  after 
infinite  steps  i.  The  intuition  behind  PageRank  is  that  the 
more  likely  a  surfer  is  to  be  at  a  page,  the  more  important  the 
page  is.  We  can  compute  the  principal  eigenvector  of  M  by 
starting  with  the  initial  vector  vn  and  multiplying  by  M  some 

number  of  times,  until  the  vector  we  get  shows  little  change 
at  each  round.  In  practice,  for  the  Web  itself,  50-75 
iterations  are  sufficient  to  converge  to  within  the  error  limits 
of  double-precision  arithmetic. 

A.  Structure  of  the  Web 

It  would  be  nice  if  Web  pages  were  strongly  connected. 
However,  it  is  not  the  case  in  practice.  An  early  study  of  the 
Web  found  it  to  have  the  structure  shown  in  Figure  1 .  There 
is  a  large  strongly  connected  component  (SCC),  but  there 
were  several  other  portions  that  were  almost  as  large  [18], 

•  The  in-component,  consisting  of  pages  that  could 
reach  the  SCC  by  following  links,  but  were  not 
reachable  from  the  SCC. 

•  The  out-component,  consisting  of  pages  reachable 
from  the  SCC  but  unable  to  reach  the  SCC. 

•  Tendrils,  which  are  of  two  types.  Some  tendrils 
consist  of  pages  reachable  from  the  in-component 
but  not  able  to  reach  the  in-component.  The  other 
tendrils  can  reach  the  out-component,  but  are  not 
reachable  from  the  out-component. 


Figure  I.  The  "bowtie"  representation  of  the  Web  [221 


In  addition,  there  were  small  numbers  of  pages  found 
either  in 

•  Tubes,  which  are  pages  reachable  from  the  in¬ 
component  and  able  to  reach  the  out-component,  but 
unable  to  reach  the  SCC  or  be  reached  from  the 
SCC. 

•  Isolated  components  that  are  unreachable  from  the 
large  components  (the  SCC,  in-  and  out- 
components)  and  unable  to  reach  those  components. 
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As  a  result,  PageRank  is  usually  modified  to  prevent  such 
anomalies.  There  are,  in  principle,  two  problems  we  need  to 
avoid.  First,  is  the  dead-end  -  a  page  that  has  no  links  out- 
which  will  bring  a  zero  column  in  the  forward  transition 
matrix,  and  consequently  it  will  cause  all  PageRanks  to 
become  zero.  The  second  problem  is  groups  of  pages  that  all 
have  out-links  but  they  never  link  to  any  other  pages.  These 
structures  are  called  spider-traps.  Both  these  problems  are 
solved  by  a  method  called  “taxation,”  where  we  assume  a 
random  surfer  has  a  finite  probability  of  leaving  the  Web  at 
any  step,  and  new  surfers  are  started  at  each  page. 

B.  Taxation 

To  avoid  the  problem  of  spider-trap  or  dead-end,  we 
modify  the  calculation  of  PageRank  by  allowing  each 
random  surfer  a  small  probability  of  teleporting  to  a  random 
page,  rather  than  following  an  out-link  from  their  current 
page.  The  iterative  step,  where  we  compute  a  new  vector 
estimate  of  PageRanks  v  from  the  current  PageRank 
estimate  v  and  the  transition  matrix  M  is 

v  =  fiMv  +(1- p)e  / n  (i) 

Where  ft  is  a  chosen  constant,  usually  in  the  range  0.8  to 
0.9,  e  is  a  vector  of  all  1  ’s  with  the  appropriate  number  of 
components,  and  n  is  the  number  of  nodes  in  the  Web  graph. 
The  term  ftMv  represents  the  case  where,  with  probability  ft, 
the  random  surfer  decides  to  follow  an  out-link  from  their 
present  page.  The  term  (1  -  ft)e/n  is  a  vector  each  of  whose 
components  has  value  (l-ft)/n  and  represents  the 
introduction,  with  probability  /  -  ft,  of  a  new  random  surfer 
at  a  random  page. 

Although  by  employing  this  formulation,  the  effect  of 
spider-trap  and  dead-end  is  controlled  and  the  PageRank  is 
distributed  to  each  of  other  nodes,  components  of  spider-trap 
still  are  managed  to  get  most  of  the  PageRank  for 
themselves.  Therefore,  the  PageRanks  of  nodes  are  still 
unreasonable.  For  instance,  in  Figure  2. ,  C  is  a  simple  spider 
trap  of  one  node  and  the  transition  matrix  is  as  follows: 


o  1/2  o  o 

1/3  0  0  1/2 

1/3  0  1  1/2 

1/3  1/2  0  () 


(2) 


Figure  2.  A  graph  wilh  a  one-node  spider  trap 

If  we  perform  the  usual  iteration  to  compute  the 
PageRank  of  the  nodes,  we  get 


‘  1/4  • 

3/21  ' 

5/48  ' 

21/288  ' 

'  0  ‘ 

1/4 

5/24 

7/48 

31/288 

0 

1/4 

11/24 

29/48 

205/288 

1 

1/4 

5/24 

7/48 

31/288 

0 

As  predicted,  all  the  PageRank  is  at  C,  since  once  there  a 
random  surfer  can  never  leave.  To  avoid  the  problem 
illustrated,  we  modify  the  calculation  of  PageRank  by  the 
Taxation  method.  Thus,  the  equation  for  the  iteration 
becomes 


0 

2/5 

0 

0 

'  1/20  ' 

v'  = 

4/15 

0 

0 

2/5 

v  + 

1/20 

4/15 

0 

4/5 

2/5 

1/20 

4/15 

2/5 

0 

0 

1/20 

Notice  that  we  have  incorporated  the  factor  ft  into  M  by 
multiplying  each  of  its  elements  by  4/5.  The  components  of 
the  vector  (1  -  ft)e/n  are  each  1/20,  since  /  -  ft  =1/5  and  n= 
4.  The  first  iteration: 


'  1/4  ' 
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41/300  ’ 

543/4500  ' 

'  15/148  ‘ 
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707/4500 

19/148 

1/4 
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153/300 

2543/4500 

95/148 

1/4 

13/60 
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707/4500 
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By  being  a  spider  trap,  C  has  still  managed  to  get  more 
than  half  of  the  PageRank  for  itself.  However,  the  effect  has 
been  limited,  and  each  of  the  nodes  gets  some  of  the 
PageRank. 


III.  A  Generalized  Method 

In  web  arena,  a  link  by  important  pages  will  impact  on 
significance  of  a  page.  However,  there  are  other  networks  in 
which  not  just  in-link  but  out-links  are  also  weighty.  For 
instance,  in  social  networks,  connecting  to  eminent  people 
(out-link)  is  as  crucial  as  being  connected  by  key  persons  (in¬ 
link)  in  evaluating  the  degree  of  prominence  of  a  member. 
Therefore,  sometimes  sorting  and  grading  nodes  of  a  graph 
only  based  on  in-links  will  result  in  an  incorrect  evaluation. 
So,  we  take  out-links  and  the  rate  of  their  impacts  with 
respect  to  in-links  into  our  computations. 

A.  Algorithm 

Suppose  we  start  as  a  random  surfer  at  any  of  the  n  pages 
of  the  Web  with  equal  probability.  Then  the  initial  vector 
will  have  1/n  for  each  component.  If  M  f  is  the  forward 
transition  matrix  of  the  Web,  then  after  one  forward  step,  the 
probability  distribution  of  the  next  surfer  place  will  be 
M fvu  and  if  M  h  is  the  backward  transition  matrix  of  the 
Web,  then  after  one  backward  step,  the  probability 
distribution  of  the  previous  surfer  place  will  became  M hv0. 

Also,  we  consider  the  importance  weight  factor  of  both  in¬ 
links  ( ft)  and  out-links  (l- ft). 

Note  that  equation  (ftM  +(l-/?)Af  h )  is  the  linear 
combination  of  both  next  and  previous  surfer  place,  and  it  is 
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also  stochastic  because  it  is  a  linear  combination  of  two 
stochastic  matrices.  So  its  eigenvalue  associated  with  the 
principal  eigenvector  will  be  1 .  The  principal  eigenvector  of 
[fiM  f  +(1  —  P)M  b)  tells  us  where  the  surfer  is  most  likely 

to  be  after  a  long  time.  Recall  that  the  intuition  behind 
PageRank  is  that  the  more  likely  a  surfer  is  to  be  at  a  page, 
the  more  important  the  page  is.  We  can  compute  the 

principal  eigenvector  of  (fiM f  +(1  —  by  starting 

with  the  initial  vector  V0  and  multiplying  by 
[PM  f  +(l  —  P)M h)  some  number  of  times,  until  the 
vector  we  get  shows  little  change  at  each  round.  Considering 
this  matrix  instead  of  Mf  has  two  advantages:  First,  in 
computing  PageRank  of  a  node,  the  importance  of  its 
neighbors  with  both  types  of  relationship  (out-link  and  in¬ 
link)  and  their  arbitrary  impact  rates  (parameter  p  )  have 
taken  into  account.  Second,  by  using  this  method,  we  do  not 
have  the  problems  about  dead-ends  and  spider-traps  because 
we  take  the  linear  combination  of  entering  probability  from 
and  exiting  probability  to  other  nodes  in  our  computation. 
Therefore,  in  case  p  *  0  and  p  *  1 ,  the  columns  related  to 
dead-ends  are  not  completely  zero.  Likewise,  for  the  spider- 
trap  columns,  probabilities  related  to  other  nodes  are  not  zero 
and  they  cannot  absorb  more  unreasonable  rank  to 
themselves.  About  cases  f}=  lor  /?  =  (),  in  the  following,  we 
proposed  another  idea  (adding  virtual  edges)  by  which  the 
random  surfer  can  exit  from  dead-ends  and  spider-traps. 

The  proposed  algorithm  is  as  follows: 

Step  1:  finding  Forward  and  Backward  transition 
matrices. 

Step  2:  considering  appropriate  formula  and  keep 
iterating  until  it  gets  converged. 

In  this  step,  three  possible  conditions  can  exist  which  are 
characterized  as  following: 

Case  1:  0  and  J3  *  \ .  It  means  that  both  forward 

and  backward  trends  are  important  to  calculate 
PageRanks.  Thus,  we  only  need  to  calculate  the 
eigenvector  of  matrix  {PM  f  +(\-  P)M  h)  ■ 

Case  2:  P  = 1  So,  we  need  only  the  forward  matrix  to 
calculate  PageRanks.  If  there  are  not  a  dead-end  or  a 
spider-trap  in  the  graph,  the  vector  of  PageRanks  is 
the  eigenvector  of  Mf  ■  If  there  are  dead-ends  or 
spider-traps,  the  eigenvector  of  M  f  assigns  most  of 

PageRank  to  spider-traps  and  dead-ends  that  is  not 
real.  Thus  we  add  enough  virtual  out-links  to  remove 
these  spider  and  dead-end  situations.  For  each  dead¬ 
end  and  spider-trap,  we  will  consider  a  virtual  edge 
in  which  source  of  them  are  dead-ends  and  one 
member  of  each  spider-traps,  respectively.  Also, 
their  destinations  can  be  any  arbitrary  nodes, 
excepting  those  of  dead-end  and  spider-traps  (see 
Figure  3.  Green  color  edges).  Hence,  If  assumed  v 


is  eigenvector  of  matrix  M f  (forward  transition 

matrix  after  adding  virtual  links),  in  order  to  find 
final  PageRanks  of  vertices,  we  have  to  remove 
effect  of  these  virtual  links  on  PageRanks  by 
calculating  the  following  equation 

v  ~(Mf  -Mfy> 

Case  3:  p  =  0 .  Here  only  backward  trend  (out-links)  is 
important  to  consider  for  calculation  of  PageRanks. 
So  we  only  need  backward  matrix  to  determine 
PageRanks.  If  there  are  not  in-component  or  in¬ 
tendril  vertices  in  the  graph,  vector  of  PageRanks  is 
eigenvector  of  M  b  -  If  there  are  in-component  or  in¬ 
tendril  vertices,  eigenvector  of  M  b  assigns  most  of 

PageRank  to  in-component  and  in-tendril  vertices, 
which  is  not  real.  Thus  we  add  enough  virtual  in¬ 
links  to  remove  these  in-component  and  in-tendril 
situations  then  after  computing  eigenvector  of  new 
backward  matrix  Mi,  we  have  to  remove  effect  of 

these  virtual  links  on  PageRanks  (see  Figure  3.  Red 
color  edges).  If  suppose  v  is  eigenvector  of  matrix 
M b  (backward  transition  matrix  after  adding  virtual 
links).  The  final  PageRanks  of  vertices  would  be 
v-(Mh-Mh)v  . 

Step  3:  normalize  PageRank  vector  to  find 
distribution  probability  of  vertices. 

As  shown  below,  if  we  consider  a  matrix  include  the 
importance  of  pairwise  comparison  of  vertices  (A), 
eigenvector  of  this  matrix  would  be  distribution  probability 
of  vertices. 

Note  that,  W  is  vector  distribution  probability  of  vertices 
that  sum  of  its  components  is  1  and  also  w .  is  amount  of 

vertex  i‘s  importance.  So,  instead  oiwijw)  in  matrix  A,  we 
let  pjpj  .  which  pl ,  are  PageRanks  of  nodes  i,  j.  We 

calculate  eigenvector  of  matrix  A  and  to  get  the  distribution 
probability  of  vertices. 

»’,/ 

Ai  A« 


B.  Biased  Random  Walk 

In  order  to  bias  the  rank  of  all  nodes  with  respect  to  a 
special  subset  of  nodes,  we  use  the  Biased  Random  Walk 
method  in  which  the  random  surfer,  in  each  iteration,  will 
jump  on  one  of  the  member  of  the  subset  with  equal 
probability.  Its  most  important  application  is  topic-sensitive 
PageRank  [19]  in  search  engines.  The  consequence  of  this 
approach  is  that  random  surfers  are  likely  to  be  at  an 
identified  page,  or  a  page  reachable  along  a  short  path  from 
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one  of  these  known  pages,  because  the  pages  they  link  to  are 
also  likely  to  be  about  the  same  topic.  The  mathematical 
formulation  for  the  iteration  that  yields  topic-sensitive 
PageRank  is  similar  to  the  equation  we  used  for  general 
PageRank.  The  only  difference  is  how  we  add  the  new 
surfers.  Suppose  S  is  a  set  of  integers  consisting  of  the 
row/column  numbers  for  the  pages  we  have  identified  as 

belonging  to  a  certain  topic  (called  the  teleport  set).  Let  es 

be  a  vector  that  has  1  in  the  components  in  S  and  0  in  other 
components.  Then  the  topic-sensitive  PageRank  for  S  is  the 
limit  of  the  iteration 

v'  =  cH0Mf  +  (l-0M»»+(l-or)e(/|t| 

0.8  S  a  So.9  (7) 

Here,  as  usual,  M  is  the  transition  matrix  of  the  Web,  and 
ISI  is  the  size  of  set  S. 

IV.  The  EXPERIMENT 

Figure  3.  is  a  graph  with  20  vertices  that  include  all 
kinds  of  network  artifacts  mentioned  in  section  2. 

SCC:  { 1 ,2,4,5,7,8,9, 1 0, 1 5 , 1 7, 1 8,20 )  TUBE:  { 1 6-6 ) 


OUT-COMPONENT:  {6,1 1,12)  I.N-COMPONENT:  (3,13,16) 

OUT-TENDRIL:  (14)  IN-TENDRIL:  (19) 


Figure  3.  Synthetic  Graph  Example 

In  case  2  (  0  =  1 ),  there  are  a  dead-end  situation  on 
vertex  14  and  a  spider-trap  situation  on  set  of  vertices  (6,  11, 
12),  and  in  order  to  remove  the  dead-end  and  the  spider-trap 
consider  2  virtual  out-link  (green  edges)  on  these  vertices. 
Also  in  case  3  ( 0  =  0 ),  there  are  in-component  situation  on 
set  of  vertices  (3,  13,  16),  and  in  order  to  remove  negative 
PageRank  consider  2  virtual  in-link  (red  edges)  on  these 
vertices.  For  completeness,  we  also  compute  the  biased 
random  walk  on  easel.  Comparing  the  results  with  easel, 
TABLE  1. ,  it  is  clear  that  PageRanks  are  biased  on  set  S={2, 
4, 7,  1 8 }.  As  we  expect,  rank  of  nodes  of  set  S  and  nodes  that 
are  pointed  by  set  S  get  higher  ranks. 


TABLE  I.  PageRank  vector  AT  cases  1 , 3,  and  biased  random 
walk. 


Results  of  case  1  ( 

P  -  0-7  ) 

Results  of  the  biased 
random  walk  on  easel 

Results  of  case  3  ( 

/*  =  <>) 

Nodes 

number 

PageRank 

Nodes 

number 

PageRank 

Nodes 

number 

PageRank 

ii 

0.945 

5 

0.9937 

17 

0.57916 

12 

0.2177 

11 

0.9878 

10 

0.3861 1 

6 

0.1767 

18 

0.9703 

13 

0.36037 

9 

0.0703 

1 

0.9432 

1 

0.27028 

10 

0.0632 

7 

0.9013 

3 

0.27028 

5 

0.0601 

15 

0.8513 

5 

0.25741 

1 

0.0543 

2 

0.7444 

9 

0.25741 

20 

0.0527 

4 

0.6847 

7 

0.24454 

15 

0.0495 

6 

0.65 

4 

0.19305 

17 

0.045 

8 

0.6414 

19 

0.19305 

8 

0.036 

9 

0.5045 

16 

0.18018 

7 

0.029 

20 

0.4878 

2 

0.16731 

4 

0.0272 

12 

0.3659 

18 

0.16731 

18 

0.025 

10 

0.3204 

8 

0.1287 

3 

0.0237 

17 

0.2976 

15 

0.1287 

13 

0.023 

3 

0.1628 

20 

0.1287 

16 

0.0223 

13 

0.1144 

12 

1.14E-17 

2 

0.0216 

16 

0.0923 

6 

7.34E-18 

14 

0.0081 

19 

0.0386 

11 

0 

19 

0.0068 

14 

0.035 

14 

0 

TABLE  II.  Comparing  results  of  the  Algorithm  and  taxation 

method  to  avoid  anomalies  in  Case  2  ( 0  =  1 ) 


Using  virtual  edges 

Taxation 

nodes  no 

PageRank 

nodes  no 

PageRank 

9 

0.508068237 

ii 

0.83086 

10 

0.508068237 

9 

0.25352 

20 

0.381051178 

10 

0.22903 

2 

0.265581124 

20 

0.19944 

17 

0.254034118 

15 

0.15968 

15 

0.254034118 

6 

0.1495 

5 

0.173205081 

5 

0.14569 

18 

0.161658075 

17 

0.14155 

8 

0.15011107 

8 

0.11547 

1 

0.138564065 

1 

0.11197 

6 

0.138564065 

7 

0.08907 

7 

0.127017059 

12 

0.08748 

11 

0.103923048 

18 

0.07921 

12 

0.069282032 

2 

0.06521 

4 

0.046188022 

4 

0.05567 

3 

7.50E-17 

13 

0.0528 

13 

2.I2E-17 

3 

0.04612 

16 

1.16E-17 

14 

0.04612 

14 

1.02E-17 

16 

0.0369 

19 

0 

19 

0.02386 

Comparing  the  results  of  the  Taxation  method  and  our 
proposed  method,  TABLE  II. ,  obviously  we  can  realize  that 
our  approach  produces  more  reasonable  outcomes.  Because, 
as  it  is  shown  in  the  TABLE  II,  node  9  is  the  junction  of  two 
cycles,  all  nodes  of  these  cycles  are  from  SCC  part  of  the 
graph,  so  the  random  surfer  is  most  likely  on  it.  The  nodes  10 
and  20  have  higher  rank  after  9,  because  they  have  in-link 
from  the  node  9.  The  rank  of  node  5  cannot  be  higher  than 
17  because  the  node  17  is  a  member  of  the  cycle  consist  of 
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node  9  and  10.  In  Taxation  result,  the  nodes  with  spider-trap 
situation  such  as  6  and  1 1  got  higher  and  vertices  2  and  18 
got  lower  PageRank  than  our  proposed  approach  results. 
Also,  for  other  vertices,  their  ranks  are  either  the  same  or 
very  close  to  each  other’s. 

V.  CONCLUSION 

In  this  paper,  the  fundamental  idea  of  Web  Structure 
mining  and  Web  Graph  is  explained  in  detail  to  have  a 
generic  understanding  of  the  data  structure  used  in  web.  The 
main  purpose  of  this  paper  is  to  present  the  new  PageRank 
based  algorithms  and  compare  that  with  the  previous 
algorithms. 

The  proposed  method  generalizes  the  approach  of 
finding  PageRank  based  on  transition  probabilities  by 
considering  the  arbitrary  impact  rates  of  both  out-links  and 
in-links,  in  order  to  include  all  possible  cases  because  there 
are  some  conditions  in  which  out-links  have  also  an 
influence  on  PageRank  of  nodes.  Moreover,  it  prevents  that 
spider-traps  and  dead-ends  have  a  high  unreasonable  rank 
and  assign  higher  PageRanks  to  themselves.  The  noticeable 
weak  point  of  previous  method  is  that  it  assigns  more 
unreasonable  PageRank  to  spider-traps  and  dead-ends,  and 
also  reduces  PageRank  of  SCC  vertices.  But  in  our  approach 
this  problem  has  been  solved,  because  by  adding  virtual 
edges,  random  surfers  will  not  stop  on  spider-traps  and  dead¬ 
ends.  According  to  [13],  DirichletRank  has  been  so  far  the 
best  method  amongst  previous  methods,  capable  of 
diminishing  the  impact  of  link  spamming  (a  special  case  of 
spider-traps)  and  dead-end  problem  that  is,  however,  only 
applicable  to  backward  analysis.  Our  approach  in 
comparison  with  their  method  is  general  for  more  types  of 
networks  and  simpler  to  understand  and  implement.  Also,  by 
using  ideas  suggested  in  this  paper,  in  any  possible  cases, 
PageRanks  is  insulated  from  the  influence  of  anomalies 
including  in/out-tendrils  and  in/out-components. 

The  generalization  of  the  PageRank  algorithm  to  include 
forward  and  backward  links  into  a  node  makes  this  approach 
applicable  to  new  domains  beyond  web  mining  and  search 
engines.  We  are  currently  exploring  the  application  of  the 
new  generalized  algorithm  to  the  analysis  of  network  data  for 
instance  using  PageRank  as  a  measurement  of  node's  activity 
score  [20]  to  find  communities. 
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Abstract 

Recent  research  has  produced  advances  in  the  understanding  of 
communities  within  a  dynamic  social  network.  A  “community”  in 
this  context  is  defined  as  a  subgraph  with  a  higher  internal  density 
and  a  lower  crossing  density  with  respect  to  other  subgraphs.  In  this 
paper,  we  describe  a  novel  and  efficient  distance  based  ranking 
algorithm,  called  the  “Correlation  Density  Rank”  (CDR),  which  is 
utilized  to  derive  the  community  tree  from  the  social  network  and  to 
develop  a  tree  learning  algorithm  that  is  employed  to  construct  an 
evolving  community  tree.  Also,  we  present  an  evolution  graph  of  the 
organizational  structure,  through  which  new  insights  into  the 
dynamic  network  may  be  obtained.  The  experiments,  conducted  on  a 
datasets,  both  synthetic  and  real,  demonstrate  the  feasibility  and 
applicability  of  the  framework. 

Keywords:  Dynamic  social  network;  Organizational  structure; 
Community  discovery;  Evolution  analysis;  Web  ranking;  Crawling; 
Correlation  Density  Rank. 


1.  Introduction 

Community  detection  is  an  important  research  issue  in 
social  network  analysis  (SNA),  where  the  objective  is  to 
recognize  related  sets  of  members  such  that  intra-community 
associations  are  denser  than  inter -communities  associations  [1- 
10].  Researchers  have  presented  various  methods  to  extract 
communities  from  a  Social  Network  (SN)  data.  .  In  particular, 
discovering  the  organizational  structure  of  communities  in  an 
SN  has  been  identified  as  an  interesting  but  challenging 
problem  [11,12]  Examples  of  important  applications  include 
characterizing  potential  candidates  for  viral  marketing,  finding 
members  of  criminal  groups,  discovering  affinity  groups,  etc. 
[12]  While  there  has  been  research  on  finding  key  members  in 
an  SN  [11-15]  the  results  have  limited  power  to  supply  a 
complete  view  of  the  organizational  structure. 

In  the  real  world,  social  networks  are  constantly  changing 
and  evolving.  New  members  may  join  the  network,  existing 
members  may  quit  from  the  network,  and  associations  among 
members  constantly  change  over  time.  Therefore,  the  approach 
should  be  capable  of  supporting  the  exploration  of 
organizational  structure  dynamically.  Some  earlier  research  has 
provided  approaches  to  detecting  communities  from  a  dynamic 
social  network  [16-19]  These  approaches  discover  changes  of 
communities  in  an  SN,  but  do  not  answer  questions  related  to 


changing  organizational  structure,  such  as  who  is  becoming 
more  powerful  or  the  shifting  of  the  power  structure. 

In  the  workflow  area,  organization  mining  has  also  been  the 
focus  of  past  research  [20-22],  The  workflow  area  research  has 
emphasized  the  exploration  of  organizational  structure  in  the 
unit  from  event  logs  of  information  systems.  Research  efforts 
[23-25]  have  also  addressed  determining  the  hierarchy  in  an 
SN,  where  hierarchy  has  a  similarity  to  an  organizational 
structure.  In  this  paper  we  use  the  notion  of  a  community  tree 
data  structure  to  represent  organizational  structure  and  its 
evolution.  This  approach  is  similar  to  that  presented  by  Qui  and 
Lin  [26]  where  the  ranking  of  nodes  is  based  on  PageRank 
algorithm  with  iteration  complexity  of  the  order  of  logN  [27], 
However,  the  technique  presented  here  has  no  iteration 
complexity  and  is  not  susceptible  to  network  anomalies  [28] 
such  as  spider-traps,  dead-ends,  etc.  and  the  “rich-get-richer” 
problem  [29].  Consequently,  the  approach  in  this  paper  is 
algorithmically  efficient  and  produces  communities  and 
organizational  structures  with  better  accuracy. 

We  propose  a  novel  density  measure,  the  Correlation 
Density  Rank  (CDR),  as  the  basis  of  community  and 
organizational  structure  detection.  We  apply  a  density  based 
method  to  describe  the  relationship  between  nodes  finked  by 
edges.  In  comparison  with  other  earlier  Density  based 
algorithms  [30-32],  this  approach  offers  several  advantages  (1) 
the  CDR  is  a  generalized  formulation,  which  permits  the 
weighting  of  different  correlation  types,  .  (2)  a  more  realistic 
solution  where  with  important  neighbors  have  more  priority 
than  lesser  ones  is  employed,  and  (3)  optimization  of  the  CDR 
is  easier,  since  the  number  of  parameters  needed  for  tuning  is 
smaller. 

The  contributions  of  this  paper  are  as  follows;  (1) 
Developing  an  algorithm  to  derive  the  community  tree  from  the 
SN.  (2)  Developing  a  tree  learning  algorithm  to  generate  the 
evolving  community  trees.  (3)  Proposing  an  approach  for 
representing  the  evolution  of  the  organizational  structure  based 
on  tire  evolving  community  trees. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2 
introduces  the  concept  of  community  tree  and  proposes  an 
approach  for  deriving  the  community  tree  from  a  static  social 
network.  Section  3  presents  the  methods  for  analyzing  the 
evolution  of  organizational  structure  in  a  dynamic  SN.  Section 
4  provides  the  experimental  results.  Section  5  offers 
concluding  remarks. 
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2.  Discovering  the  organizational  structure  in  a  static 
social  network 

2.1  Organization  structure  of  a  network 

Song  and  Van  [33]  developed  a  definition  of  organizational 
model,  where  the  organizational  model  consists  of 
organizational  units  (e.g.  functional  units),  roles  (e.g.  duty), 
originators,  and  relationships  (e.g.,  hierarchy).  In  this  paper,  we 
assume  that  the  organizational  structure  of  an  SN  is  a  hierarchy 
that  represents  communities  (or  units)  and  subordinations  of 
members  in  the  SN.  However,  the  derived  hierarchical 
organizational  structure  of  an  SN  is  not  necessarily  reflected  in 
a  real-world  organization  [26].  For  instance,  there  exists  real 
organizations  that  are  cannot  be  characterized  as  an  academic 
network,  nor  in  a  blogosphere.  In  this  case  for  a  network 
formed  by  e-mail  communications,  we  may  not  be  able  to 
exactly  determine  to  what  extent  the  relationships  can  be 
mapped  to  the  real-world  organizational  structure.  Therefore, 
in  deriving  the  organizational  relationships,  subordination 
describes  the  relationship  between  two  members  where  the 
leader  is  the  most  likely  and  more  important  destination  of 
information  flow  starting  from  the  subordinate,  i.e.,  the 
subordinate  has  a  closest  interaction  with  the  leader  in 
comparison  with  others  while  the  leader  has  a  higher 
possibility  of  attracting  interaction  with  the  nodes  across  the 
entire  network  in  comparison  to  the  subordinate  nodes. 

The  importance  of  members  in  the  SN,  is  depicted  through 
a  score,  the  m-Score,  which  is  equivalent  to  CDR  value  of  a 
member.  The  m-Score  value  of  a  member  would  be  its 
certainty  on  attracting  interaction  with  all  nodes  through  whole 
network.  The  higher  the  score  is,  the  more  important  the 
member  is.  We  use  a  data  structure,  called  the  community  tree, 
to  represent  the  SN  organizational  structure. 

Definition  1.  (Community  Tree):  Let  N  =  {nx,...,nk  }  be  a 

collection  of  members  in  an  SN,  CT  is  a  tree,  and  NULL  is  the 
root  of  the  tree,  and  every  member  in  N  is  referred  to  as  a  node 
in  the  tree.  Each  member  /?,  in  CT  it  has  a  unique  parent  node 
M  where  m-Score  ( fl, )  >  m-Score  ( nt ).  If  the  parent  node  of 

nt  is  the  root  node  NULL,  W,  is  called  a  core  of  the  tree.  A 
core  and  its  descendants  compose  a  community. 

To  derive  the  organizational  structure,  we  calculate  the  m- 
Score  for  every  member  and  then  attempt  to  find  the  immediate 
leader  of  every  member  in  a  network.  Further,  we  construct  the 
community  tree  using  m-Score  and  subordinations. 

After  having  constructed  the  SN  community  tree,  we  can 
discover  communities  and  obtain  the  SN  organizational 
structure.  An  example  of  a  community  tree  is  illustrated  in  Fig. 
1  where  node  1  and  node  5  are  cores  of  community  1  and 
community  2,  respectively. 
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No:  W91  INF-1 2-2-0067  and  Army  Research  Office  under  Grant  Number 
W91  INF-1 1-1-0168.  Any  opinions,  findings,  conclusions  or  recommendations 
expressed  here  are  those  of  the  authors)  and  do  not  necessarily  reflect  the 
views  of  the  sponsor. 


Figure  1 .  An  example  of  a  community  tree  [26]. 

SN  may  be  regarded  as  a  graph  indicating  information 
flows  among  members.  The  relationship  of  SN  members  can  be 
obtained  by  analyzing  information  flows  [34],  The  process  of 
determining  SN  information  flow  is  similar  to  a  random  walk 
on  a  graph  [35,36].  Given  a  graph  and  a  starting  point,  the 
starting  point's  neighbor  is  selected  at  random,  and  the  next 
start  point  is  moved  to  this  neighbor  then  a  neighbor  of  this 
new  start  point  is  selected  again  at  random  and  so  on.  The 
random  sequence  of  points  generated  in  this  process  is  a 
random  walk  on  the  graph.  The  expected  lengths  of  random 
walks  on  the  graph,  can  be  used  to  derive  randomized  shortest 
paths  (RSP)  dissimilarity  [37,38].  The  RSP  dissimilarity, 
which  has  its  foundation  in  statistical  physics,  may  be  used  to 
compute  the  shortest  path  distance  for  all  pairs  of  nodes  of  a 
graph  in  closed  form.  Recently  [39]  has  generalized  distance 
from  the  RSP  framework  based  on  the  Helmholtz  free  energy 
between  two  states  of  a  thermodynamic  system  with  a  distance 
measure,  the  free  energy  distance.  We  employ  the  RSP 
measurement  method  in  [39]  as  the  distance  between  nodes, 
but  with  one  major  difference:  we  consider  customized  initial 
cost  for  edges  such  that,  along  with  finding  shortest  path 
between  nodes.  The  random  walker  intelligently  selects  the 
most  important  neighbor  resulting  in  lower  cost  and  smaller 
distance. 

Combining  RSP  with  m-Score  of  every  node,  we  can  find 
an  immediate  leader  for  every  member  in  an  SN  (see  section 
2.3).  The  SN  community  tree  is  derived  in  this  way.  Our 
framework  includes  the  following  steps  to  derive  the 
community  tree: 

a)  Employ  the  novel  "Correlation  Density  Rank" 
method  to  ranking  nodes  as  the  m-Score  for  every 
node  in  an  SN;  and, 

b)  Combine  RSP  with  m-Score  of  every  node  to  derive  a 
community  tree. 

2.2  Calculating  m-Score 

To  calculate  the  m-Score  for  every  node,  we  need  to 
investigate  each  node's  importance  in  a  network.  Those  nodes 
that  link  many  important  nodes  are  also  themselves  important. 
Such  a  process  is  very  similar  to  PageRank  based  algorithms 
[40].  PageRank  is  a  link  analysis  algorithm  that  produces  a 
global  “importance”  ranking  for  every  web  page  by  analyzing 
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links  among  web  pages.  A  fast  and  efficient  page  ranking 
mechanism  for  web  crawling  and  retrieval  remains  a 
challenging  issue.  Recently,  several  link  based  ranking 
algorithms  like  PageRank,  HITS,  OPIC  and  etc.  [27]  have  been 
proposed. 

We  propose  a  novel  method.  Correlation  Density  Rank, 
based  on  finding  more  frequent  and  influential  RSP.  The  CDR 
considers  the  entropy  of  distance  between  nodes  as  punishment 
and  is  used  to  compute  ranks  of  nodes.  Hence,  there  will  be  a 
larger  traffic  amongst  shortest  path  of  nodes,  if  the  distance 
becomes  smaller.  As  proved  in  [41],  if  the  distance  between  i 
and  j  was  less  than  the  distance  between  /  and  k,  then,  i ’s  rank 
effect  on  j  is  more  than  on  k,  in  other  words,  the  probability 
that  a  random  surfer  reach  j  from  i  is  more  than  the  probability 
to  reach  k.  Therefore,  the  objective  is  to  minimize  punishment 
so  that  a  node  with  less  distance  entropy  to  have  a  higher  rank. 

The  Shannon  entropy  is  a  measurement  of  system 
uncertainty  [42],  The  larger  the  Shannon  entropy  is,  the  more 
uncertainty  the  system  will  be.  If  the  CDR  value  of  a  node  in 
complex  network  is  the  smallest,  then  the  uncertainty  of  its 
distance  distribution  from  other  nodes  is  the  greatest.  On  the 
contrary,  while  the  CDR  value  of  a  node  in  complex  network  is 
very  high,  then  the  uncertainty  of  its  distance  distribution  from 
other  nodes  is  the  smallest. 

Moreover,  the  more  popular  nodes  are,  the  more  linkages 
other  nodes  tend  to  have  to  them  or  are  linked  to  by  them.  The 
proposed  algorithm  is  analogous  to  the  weighted  PageRank 
algorithm  [43],  assigning  larger  rank  values  to  more  important 
(popular)  nodes  instead  of  dividing  the  rank  value  of  a  node 
evenly  among  its  out-link  nodes.  We  assign  each  out-link  node 
a  value  proportional  to  its  popularity  (its  number  of  in-links 
and  out-links).  The  popularity  from  the  number  of  in-links  and 
out-links  is  recorded  as  W*'  and  W  ,  respectively. 


W !"  is  the  weight  of  link  between  node  nl  and 
calculated  based  on  the  number  of  in-links  of  node  nj  and  the 
number  of  in-links  of  all  reference  nodes  of  node  rt,  . 

/. 


w:  =-  _ 

*  1 1, 
peK(n,  ) 


Where  /„  and  I  p  represent  the  number  of  frequency  in¬ 
links  of  node  n.  and  node  p ,  respectively.  R(  /?, )  denotes  the 
reference  node  list  of  node  W,  . 


W  is  the  weight  of  link  between  node  ni  and  nj 
calculated  based  on  the  number  of  frequency  out-links  of  node 
rij  and  the  number  of  out-links  of  all  reference  nodes  of  node 


wr  — 

'  i 

peJt(n,) 

Where  On  and  Op  represent  the  number  of  out-links  of 
node  Hj  and  node  p,  respectively.  R(  M, )  denotes  the  reference 
node  list  of  node  Wf .  These  equations  has  two  exceptions, 
first,  if  node  /J;  is  a  dead-end  (which  may  be  easily  determined 
from  the  frequency  matrix),  we  let  W  °u'  -  e  that  £  is  a  very 
small  number  less  than  1.  Second,  W  ,W  ’’’  =  1 ,  that  means 

R(  )=  { /i . }  we  add  e  to  sum  of  the  reference  nodes’ 

frequency  out/in-link.  An  algorithm  for  calculating  the  m- 
Score  of  members  in  a  social  network  is  described  as  follows. 

Algorithm  1.  Calculating  m-Score  for  members:  Correlation 
Density  Rank  (CDR) 

Input:  social  network  G 

Out:  vector  of  m-Score  for  all  members  R 


1 .  Initialize  cost  distance  matrix  C 


c  [i,j)=  log 


(I -exp (-rfjj  )) 

(»-  >  r ) 


(The  logarithm  of  (1  -  exp (-yfv ))  based  on  (1  -w  ]"w  ) ) 

2.  Finding  the  matrix  of  RSP  dissimilarities  by  employ  the 
algorithm  of  [43]:  { 

W  +-P*  oexp(-/JC) 

z «—  (/  - wy 1 

(Note  that  y  )->  +n-  +(pi+B-,+  > 

S  «—  (Z  (C  °JV  )Z)-t-(Z  +£) 

C<-  S  -ed] 

Arsf  *—  AC  +  (1  — A)Cr  0  <  A  <  1  } 

3.  M  ♦—  Normalize  matrix  ARsr  on  rows 

4.  For  each  node  ( 1  <,  i  <,  k  )  compute  the  entropy  of 
related  row  from  matrix  M: 


5.  Return  R 
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Where  f  tJ  is  the  number  of  frequency  from  node  M,  to  node 
nj.  if fv  =  0 ,  let  C[i,j]  =  oo  or  a  very  big  number.  Pnf 
is  the  transition  probability  matrix  that  P'f  is  equal  to  the  rate 

of  fy  divided  by  sum  of  frequency  between  node  and  all  its 

references  nodes,  k  is  the  number  of  members  in  social 
network  (or  nodes  on  G). 

The  parameters  y  ,  p  and  X  are  input  values  determined  by 
user,  y  controls  the  effect  of  frequency  on  die  cost  function 
which  restrict  cost  ratio  with  respect  to  our  defined  infinite 
constant.  /?  is  the  influence  of  the  cost  on  the  walker’s 
selection  of  a  path,  and  is  equal  to  inverse  of  temperature  at 
Helmholtz  free  energy  in  thermodynamical  system  [39].  X  is 
the  weight  factor  by  which  the  leadership  status  of  members 
will  be  distinguished  by  the  amount  of  contact  density  comes 
in  or  goes  out. 

Also,  in  step  2,  ds  =  diag  (S )  is  die  vector  of  diagonal 
elements  of  .S’,  and  e  is  the  identity  matrix.  Note  that  A  °  B  and 
A  B  are  elementwise  product  and  division,  respectively. 

For  calculating  step  2,  we  use  the  easier  way  of  computing 
the  matrix  Z  [38].  The  values  of  Rt  ( 1  <k  )  indicate  the 
final  m-Scores  of  members  in  the  social  network. 

2.3  Deriving  the  Community  Tree 

The  m- Score  of  every  member  is  combined  with  the 
normalized  RSP  matrix  (M)  on  the  graph  to  derive  the 
community  tree  from  an  SN.  The  RSP  matrix  helps  us  to  find 
the  most  likely  and  closest  interaction  for  each  node,  and  m- 
Scores  determine  whether  there  is  the  leadership  relation 
between  two  nodes  with  closest  interaction. 

Algoridim  2.  Deriving  Community  Tree:  CT  Deriving 

Input:  Social  network  G 

Output:  Community  Tree  CT 

1.  CT«—  [null,... .null] 

2.  R  <—  Correlation  Density  Rank  (G); 

3.  For  each  member  H,  { 

k<—  arg,  min  (A/„ ) 

if  R[k]  >  R[i] 

CT[i]  —  k 

} 

4.  Return  CT 

Initiating  a  random  walks  at  node  we  can  find  ending 
node  j  with  the  most  likely  correlation  density,  from  the 
normalized  RSP  matrix  M,  by  starting.  If  the  m-Score  of  node 
j  is  much  greater  than  that  of  node  i,  we  regard  j  as  the  parent 


node  of  node  i.  After  all  the  operations  end,  we  obtain  a 
community  tree  CT  represented  in  an  array  where  CT[/] 
indicates  the  parent  node  of  node  /.  A  null  value  of  CT[/] 
indicates  that  there  is  no  the  immediate  leader  of  node  i. 
Hence  node  i  is  the  core  of  community.  If  a  node  does  not 
have  both  immediate  leader  and  belongingness,  is  called 
private  node. 

Node  /  and  its  descendent  compose  a  community.  The  m- 
Score  of  each  member  shows  the  member's  importance  in  the 
community.  The  parent  of  each  node  is  its  immediate  leader. 

3.  Exploring  evolution  of  the  organizational  structure 
in  a  dynamic  social  network 

In  Section  2,  we  develop  the  algorithms  to  derive  a  static 
community  tree  from  a  static  SN.  However,  the  static 
community  tree  does  not  do  a  good  job  in  presenting  the 
evolution  of  the  organizational  structure  in  the  dynamic  SN, 
because  it  does  not  consider  intra  time-step  evolutions. 

To  present  the  evolution  of  organizational  structure  in  a 
dynamic  SN,  we  aggregate  the  change  of  an  SN  during 
different  time  periods,  and  then  derive  community  trees.  We 
further  construct  the  evolving  community  tree  from  the  two 
closest  community  trees.  Hence,  the  evolving  community  tree 
can  accurately  present  evolution  of  organizational  structure 
over  time. 

The  Best-first  search  algorithm  explores  a  graph  by 
expanding  the  most  promising  node  chosen  according  to  a 
specified  rule  [44].  Motivated  by  the  idea  of  the  Best-first 
search,  [26]  uses  tree  edit  distance  [45]  (defined  as  the  least 
cost  of  edit  operation  to  change  a  tree  to  another  tree)  as  a 
measure  of  distance  (similarity)  between  the  two  trees.  We 
propose  a  new  algorithm  to  derive  the  evolving  community 
tree  with  more  efficiency  and  less  complexity  than  previous 
ways.  Because  constructing  community  tree  in  our  idea  is 
based  on  RSP,  we  only  need  to  compute  linear  combination  of 
RSP  of  two  static  community  trees  and  use  the  new  RSP  to 
drive  evolving  CT  using  algorithm  (2)  without  any  iteration. 
However,  [26]  generated  a  collection  of  candidate  evolving 
community  trees  and  then  chose  the  candidate  having  the 
minimum  ES  score  (by  means  of  scoring  function,  which 
measures  distance  errors  among  evolving  CT,  previous  period 
CT  and  current  period  CT)  as  a  solution  of  each  iteration. 

3.1  Tree  learning  algorithm 

We  propose  a  tree  learning  algorithm  to  derive  an  evolving 
community  tree  from  two  static  community  trees.  The 
constructing  process  is  as  follows: 

(1)  Obtain  a  collection  of  members  in  the  evolving 
community  tree  Nce  =  N ln,  U  where  N  and  Ncs  are 

collections  of  members  in  the  previous  time  period  static 
community  tree  and  current  time  periods  one  respectively; 

(2)  Compute  the  RSP  for  all  pair  members  in  the  evolving 
community  tree,  where  a  is  a  smoothing  factor, 

&*fr=(l-a)»A*^r+a»A*sSF.  For  those  members  that 
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appear  in  the  evolving  community  tree  but  not  in  the  current 
community  tree,  if  their  m-scores  are  less  than  a  threshold  0, 
we  regard  them  as  retired  members  and  remove  them  from  the 
evolving  community  tree; 

(3)  According  to  the  definition  of  algorithm  1,  having 
A^P  matrix,  we  can  compute  Rce ,  Mce matrices.  Then,  by 
applying  them  into  Algorithm  2,  we  construct  the  evolving 
community  tree. 


Cj+1  is  the  evolving  entity  of  Ct .  We  say  {cl,...,cn  }  is  a  life¬ 
line  of  the  community  Ct  .  Community  Ck  eC  is  called  the 
entity  of  the  life-line.  A  life-line  depicts  an  evolving  process 
of  one  community  in  the  dynamic  SN. 

Definition  3.  (supporter):  Given  a  life-line 
CL  ={ct,c2,...,c„ } ,  we  call  the  members  as  supporters  of 
CL  if  they  appear  in  CL  not  less  than  8  times  with  S  <=n  . 


Algorithm  3.  Learning  evolving  community  tree: 
ECT_Learning 

Input:  RSP  matrices  A*sp  ,  A*fP 
Output:  Evolving  community  tree  CT  e 

1-  N„<-Npn\jNa 

2.  A*s'<-(l-a).A^+a.A*s' 

3.  Nce  -{n,  e  N(t  -  N„  \ m  -score(n, )  <  0} 

4.  Do  step  3,4  of  algorithm  1  to  find  Ra , 

5.  Employ  algorithm2  to  find  CTn 

6.  Return  CTn 

3.2  Exploring  dynamic  social  network 
After  deriving  a  series  of  evolving  community  trees,  we 
exploit  these  trees  to  discover  the  evolution  path  of  the 
organizational  structure  and  to  study  the  properties  of  the 
dynamic  SN. 

We  can  consider  four  types  of  relationships  among 
communities  to  generate  an  evolution  graph  that  represent 
evolution  of  the  organizational  structure.  Fig.  2  provides 
examples  to  illustrate  the  relationships  among  communities 
[26]. 


Figure  2.  Relationships  between  communities. 

Combing  evolution  graph  and  dynamic  SN  properties,  we 
can  obtain  insights  into  the  dynamic  SN.  Definitions  2-4 
define  key  properties  of  the  dynamic  SN  [26]. 

Definition  2.  (life-line):  Let  C  ={c1,...,cn}  be  a  collection 
of  communities.  For  each  community  C(  6  C  with  i  <  n  , 


In  Definition  3,  S  is  a  parameter  set  by  the  user.  If  there  is 
a  life-line  CL  ={c,,C2,C3,C4}  and  S  =  3,  that  means  only 
members  appearing  in  the  life-line  not  less  than  3  times  are 
supporters  of  CL  .  Exploring  life-lines  in  the  evolution  graph 
and  their  supporters  helps  to  better  understand  dynamic  social 
networks.  For  example,  we  can  discover  the  backbone  of 
criminal  group  or  detect  loyal  members  in  a  forum  over  time. 

Definition  4.  (Activeness  of  community):  Let  node  p  be 
the  core  of  community  c  ,  we  use  the  m  -Score  of  p  to 
indicate  the  activeness  of  community  c  .  Activeness  of 
community  is  a  metric  for  the  extent  to  which  associations 
occur  among  members  of  the  community  over  one  period  of 
time.  We  can  use  Activeness  of  community  to  reveal  hot 
communities,  which  may  reflect  on-going  hot  topics  in  the 
forum  or  new  activities  in  a  criminal  group. 


4.  Experiments 

In  order  to  implement  our  approach,  first,  we  consider  a 
small  dynamic,  synthetic  network  (Fig.  3)  during  five  time 
intervals. 


Frequency  Contacts  on  time  period  3 


Frequency  Contacts  on  time  period  4 


Figure  3.  A  small  dynamic  network  with  frequency  contacts  between  nodes 
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After  employing  algorithms  1  and  2,  we  find  static 
community  trees  for  each  periods  of  time  separately  that 
shown  in  Fig.  4  The  arrows  indicate  leadership,  and  The 
parameters  let  y  =  0.1, e  =  0.001,  A  =  0.5,/?  =  0.01, a  =  0.7 . 


Static  community  tree  on  time  period  1  Static  community  tree  on  time  period  2 


Static  community  tree  on  time  period  3  Static  community  tree  on  time  period  4 


Now,  in  this  stage,  we  have  information  about  leadership 
during  each  periods  of  time  separately  that  is  not  enough  for 
the  complete  analysis.  So,  with  the  purpose  of  achieve  a  good 
understanding  of  the  network’s  changing  trends  during  intra 
time-step  evolutions,  we  employ  algorithms  3  to  drive  the 
evolving  community  trees  shown  in  Fig.  6. 


NUU  nuu 


Static  community  tree  on  time  period  S 


Figure  6.  The  evolving  community  trees  for  the  synthetic  network 


Figure  4.  Static  community  trees  for  each  periods  of  time  separately. 


The  analysis  of  outputs  clarifies  some  events  that  were 
happened  during  time: 


Also,  the  trend  of  member’s  activity  in  Fig.  5  May  be  seen. 

0.35 
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periods  of  time 


Figure  5.  Evolving  member’s  activity  for  the  synthetic  network 


a)  From  T1  to  T2,  the  leader  of  member  E  changes  from 
CtoB. 

b)  From  T2  to  T3,  leader  of  members  A  and  E  change 
from  B  to  D. 

c)  From  T3  to  T4,  1)  splitting  on  B,  C.  2)  evolving 
between  C,  D.  3)  emerging  the  core  D  related  to 
members  B,  E. 

d)  From  T4  to  T5,  splitting  on  member  E. 


Figure  7.  Evolution  of  communities  for  the  synthetic  network. 

Fig.  7  Shows  the  evolution  map  of  communities  in  which 
Community  that  its  core  is  D  has  most  stability  (number  of 
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supporters  divided  by  the  size,  named  stability.  Size  is  the 
average  number  of  members  in  each  community  in  life-lines.) 
among  all  communities  and  its  supporters  are  members  B  and 

C. 

While  the  straightforward  application  of  this  method  is  in 
social  networks,  this  technique  is  appropriate  for  all  type  of 
complex  networks,  and  the  type  of  network  does  not  influence 
the  results.  We  have  used  the  real  frequency  data  from  a 
computer  network  of  288  nodes  to  evaluate  this  approach.  Data 
for  a  period  of  100  seconds,  divided  into  10  equal  periods  of  10 
seconds  each,  is  used  to  construct  evolving  community  trees 
and  draw  evolution  map  of  communities  they  are. 

After  employing  algorithms  1,  2  and  3,  we  have  reached 
evolving  community  trees  as  shown  in  Appendix.  The  arrows 
indicate  hierarchical  leadership  between  nodes,  and  the 
parameters  let y  =  0.1, £  =  0.001,A  =  0.5,/?  =  0.01, a  =  0.7  . 


Figure  8.  Evolution  of  communities  for  the  real  data  set. 


Fig.  8  Shows  the  evolution  map  of  communities,  in  which 
the  community  that  its  core  is  node  82  has  185  size,  longer  life¬ 
line  (30  second)  and  the  largest  stability  with  amount  of  0.7 
among  all  communities.  As  expected,  the  algorithm  identifies 
routers  and  hubs  as  the  cores  of  communities. 


5.  Conclusions 

Exploring  organizational  structure  in  a  dynamic  social 
network  has  a  broad  range  of  applications,  such  as  monitoring 
gang  activities,  fraud  detection,  and  improving  performance  of 
viral  marketing.  In  this  paper,  we  present  our  research  effort  in 
extracting  organizational  structure  from  such  data  to  obtain  a 
better  understanding  of  the  social  network.  We  formalize  a 
community  tree  data  structure  for  the  purpose  of  representing 
the  social  network  organizational  structure,  and  propose  a 
framework  to  explore  the  dynamic  behavior  of  the  participants 
of  the  community.  The  framework  is  composed  of  three  main 
parts:  (1)  defining  the  “Correlation  Density  Rank”,  to  rank  the 
nodes  to  acquire  a  community  tree  from  the  static  social 
network;  and,  (2)  a  tree  learning  algorithm,  which  employs  the 
tree  edit  distance  as  a  scoring  function,  to  generate  the  evolving 
community  tree;  These  algorithms  were  applied  to  a  synthetic 


and  a  real  world  datasets,  and  produce  good  results.  The 
experiments  show  that  the  framework  can  well  present  the 
organizational  structure  of  a  social  network. 

We  obtained  experimentally  the  following  insights:  (1) 
those  communities  with  long  life-line  and  great  stability  likely 
correspond  to  a  real  organization;  and  (2)  the  cores  in  an 
organizational  structure,  in  general,  are  either  the  leaders  of  the 
organization  or  the  agents  of  these  leaders.  Although  it  is 
possible  that  the  organizational  structure  discovered  from  a 
social  network  is  not  perfectly  in  line  with  the  real  world 
organization,  the  approach  described  here  helps  reach  new 
understandings  of  the  organization  based  on  the  power  of 
attracting  information  flow  and  the  interaction  closeness. 
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Abstract 

Researchers  have  typically  concentrated  on  analyzing  what 
happens  internally  in  a  complex  network  and  using  this  to  distinguish 
between  nodes.  However,  there  has  been  less  effort  towards 
comparing  between  different  networks.  In  this  paper,  we  proposed  a 
novel  approach  to  rank  alternative  complex  networks  based  on  their 
performances.  We  consider  this  as  a  ranking  problem  in  decision 
analysis  based  on  occurring  positive/negative  frequent  events  as 
criteria,  and  using  the  TOPSIS  method  to  rank  alternatives.  In  order 
to  assign  a  score  to  the  networks  for  each  criterion,  a  statistical 
method  that  estimates  the  expected  value  of  positive/negative 
frequent  events  on  a  random  node  is  presented.  The  proposed 
technique  is  efficient  in  terms  of  algorithm  complexity  and  is  capable 
of  discriminating  events  occurring  between  important  nodes  over 
those  between  less  significant  nodes.  The  experiments,  conducted  on 
several  synthetic  networks,  demonstrate  the  feasibility  and 
applicability  of  the  ranking  methodology. 

Keywords:  Complex  Network;  Network  Performance  Rank 
(NPR);  Correlation  Density  Rank  (CDR);  Multi-Criteria  Decision 
Making  (MCDM);  TOPSIS  method;  Renyi  entropy;  Gaussian 
influence  function. 

1.  Introduction 

In  current  years,  researchers  have  mostly  focused  on  the 
internals  of  complex  networks  developing  techniques  such  as 
detecting  communities  [1-12],  ranking  nodes  [13-18],  finding 
outliers  [19-21],  and  etc.  There  has  been  less  attention  given 
towards  the  performance  comparisons  between  different 
networks.  This  problem  manifests  itself  in  different  domains 
including  Computer,  Telecommunication,  Electrical  Circuit, 
Supply  Chain,  Social  networks  etc.,  where,  there  is  a  need  to 
evaluate  different  network  architectures,  equipment,  protocols 
etc.  with  the  constraint,  that  it  is  not  possible  to  replicate  the 
exact  same  scenarios  in  each  case,  hi  this  study,  we  assume  this 
objective  as  a  Ranking  problem  in  Multi-Criteria  Decision 
Making  (MCDM)  [22-26]  field  based  on  occurring 
positive/negative  frequent  events  as  the  criteria. 

Since  any  event  occurs  between  two  nodes  of  a  network, 
and  the  nodes  could  not  considered  as  independent  variables, 
statistical  analysis  to  compute  the  probability  of 
failed/successful  occurrence  between  random  nodes  throughout 
the  network  would  be  very  difficult.  This  paper  proposes  a 
novel  approach  to  approximate  variance  of  all  type  of  event  per 
networks,  which  is  used  to  estimate  the  expected  values  of  the 
events  between  two  random  nodes. 

The  contributions  of  this  paper  are  as  follows:  (1)  Defining 
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the  networks  performance  comparison  problem  as  a  Multi- 
Criteria  Decision  Making  (MCDM)  ranking  problem,  (2) 
Developing  an  approach  to  compute  the  diversity  of  density 
(DOD)  of  events  in  networks  to  evaluate  the  variance  where 
the  events  happening  between  important  nodes  are  positively 
discriminated  over  events  between  less  significant  nodes.  (3) 
Approximating  the  probability  distribution  and  the  expected 
value  of  occurrences  on  a  random  node  for  scoring  each 
network  per  criteria. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2 
presents  the  general  framework  and  all  approaches  needed  for 
ranking  alternative  networks.  Section  3  provides  the 
experimental  results.  Section  4  offers  concluding  remarks. 

2.  Proposed  approach  to  compare  between  networks 

2.1  General  framework 

In  order  to  compare  between  networks  performance,  we 
consider  this  issue  as  a  ranking  problem  in  MCDM.  A  MCDM 
problem  can  be  concisely  expressed  in  matrix  format  as 
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Figure  1 .  A  decision  matrix  in  MCDM  problem  model 

Where  Ax,A2,...,Am  are  possible  alternative  networks 
among  which  have  to  rank,  cl,c2,...,cn  are  criteria  with  which 
alternative  performance  are  measured,  X  is  the  score  of 
alternative  At  with  respect  to  criterion  cj  ,  W  ^  is  the  weight  of 
criterion  Cj  . 

Various  attributes  can  be  selected  as  criteria  but,  here,  we 
focus  on  positive/negative  frequent  events  which  may  occur 
between  nodes  during  a  given  big  enough  period  of  time,  and 
effect  on  network’s  performance.  For  instance,  the  re¬ 
transaction  or  any  failed  operation  between  nodes  is  the 
negative  event  in  net  which  decrease  network’s  efficiency.  On 
the  contrary,  the  more  probability  density  of  successfiil  and 
positive  occurrence  is  the  more  efficiency  will  be  in  the 
network. 


So,  for  establishing  the  decision  matrix  follow  steps 
needed: 

a)  Select  the  collection  of  criteria. 

b)  Scoring  networks  on  criteria. 


Where  I  n  and  I p  represent  the  number  of  frequency  in¬ 
links  of  node  n  j  and  node  p,  respectively.  R(  W(.  )  denotes  the 
reference  node  list  of  node  n, . 


After  that,  using  TOPSIS  method  which  explained  in 
section  2-3  to  rank  the  alternative  networks. 

2.2  Scoring  networks  on  criteria 

With  the  purpose  of  scoring  networks  on  criteria,  we 
proposed  a  new  density  based  approach  which  compute  the 
global  DOD  of  the  given  positive/negative  frequent  event 
(criteria)  for  each  networks.  Gaussian  distribution  is  employed 
based  on  the  average  number  of  events  per  unit  as  the  mean 
parameter  and  the  approximated  DOD  is  used  as  the  variance 
parameter  to  estimate  the  expected  value  of  the  event 
frequency  between  two  random  nodes  per  network  during 
given  big  enough  time  periods. 

In  order  to  compute  the  global  DOD  on  given  criterion,  we 
used  the  modified  “Correlation  Density  Rank”  Method  [27] 
which  finds  probability  density  distribution  of  the  related 
frequent  event  on  all  nodes,  and  then  we  utilize  the  Renyi 
entropy  [28]  to  realize  the  global  unpredictability  or  diversity 
of  these  densities  on  whole  network. 

2.2.1. Correlation  Density  Rank 


We  use  the  Correlation  Density  Rank  (CDR),  [27]  which 
finds  more  frequent  and  influential  Randomized  shortest  Path 
(RSP).  The  CDR  considers  the  distance  between  nodes  as 
punishment  and  is  used  to  compute  probability  density  of 
nodes.  Hence,  there  will  be  a  larger  traffic  amongst  shortest 
path  of  nodes,  if  the  distance  becomes  smaller.  Therefore,  the 
objective  is  to  minimize  punishment  so  that  a  node  with  high 
value  of  density  probability  to  have  a  higher  rank. 

Moreover,  the  more  popular  nodes  are  the  more  linkages 
other  nodes  tend  to  have  to  them  or  are  linked  to  by  them.  The 
proposed  algorithm  is  analogous  to  the  weighted  PageRank 
algorithm  [29,  30],  assigning  larger  rank  values  to  more 
important  (popular)  nodes  instead  of  dividing  the  rank  value  of 
a  node  evenly  among  its  out-link  nodes.  We  assign  each  out- 
link  node  a  value  proportional  to  its  popularity  (its  number  of 
in-links  and  out-links).  The  popularity  from  the  number  of  in¬ 
links  and  out-links  is  recorded  as  W"1  and  IV  ,  respectively. 

Wy"  is  the  weight  of  link  between  node  nl  and 
calculated  based  on  the  number  of  in-links  of  node  nj  and  the 
number  of  in-links  of  all  reference  nodes  of  node  ni . 
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W  is  the  weight  of  link  between  node  ni  and  n. 
calculated  based  on  the  number  of  frequency  out-links  of  node 
and  the  number  of  out-links  of  all  reference  nodes  of  node 
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Where  O  and  Op  represent  the  number  of  out-links  of 
node  rij  and  node  p,  respectively.  R(  H, )  denotes  the  reference 

node  list  of  node  Ht .  These  equations  has  two  exceptions,  first, 
if  node  n}  is  a  dead-end  (which  may  be  easily  determined  from 
the  frequency  matrix),  we  let  W°‘"  =  e  that  e  is  a  very  small 
number  less  than  1 .  Second,  W  "" ,  Wj"  =  1 ,  that  means  R(  /?, 
)={  nj  }  we  add  e  to  sum  of  the  reference  nodes’  frequency 
out/in-link. 

An  algorithm  for  calculating  the  probability  density  of 
related  frequent  event  for  all  members  in  a  complex  network  is 
described  as  follows. 

Algorithm  1 .  Correlation  Density  Rank  (CDR): 

Input:  social  network  G 

Out:  vector  of  probability  density  distribution  CDR 
a)  Initialize  cost  distance  matrix  C 


1  1  0-cxp(-yf..  )) 

C[i  ,j]  = 


(3) 


(The  logarithm  of  (1  -  exp (-yf,t ))  based  on  ( 1  — w  'V  ) ) 


b)  Finding  the  matrix  of  RSP  dissimilarities  by 


employ  the  algorithm  of  [29]: 
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A*s/,<-0.5((&f£r)  (8) 

} 

c)  M*—  Normalize  matrix  ARSP  on  columns 

d)  For  each  node  nj  (\<,j  <k  )  compute  inverse  of 
the  entropy  [31]  of  related  column  from  matrix  M  (a j  is  the 
jth  kernel  scale  parameter  which  describes  the  influence  of  a 
nodenj  within  its  Neighborhood,  we  optimize  a  for  each  node 
to  make  the  density  values  the  most  different): 

(9) 


network  which  are  considered  as  probability  density 
distribution  on  nodes. 

2.2.2.Measure  of  the  global  unpredictability/  DOD  for 
each  criterion  per  network 

The  Shannon  entropy  is  a  measurement  of  system 
uncertainty,  unpredictability,  diversity  and  randomness  [31] 
and  has  been  used  in  statistics  and  information  theory  to 
develop  measures  of  the  information  content  [35].  The  larger 
the  Shannon  entropy  is,  the  more 
uncertainty/unpredictability/randomness  and  less  diversity  of 
the  system  will  be.  Also,  Shannon  entropy  is  the  classical 
measure  of  information  content  and  is  defined  for  an  n- 
dimensional  probability  density  (PD)  distribution  P(x)  as: 


1 

(Tj  « -  (10) 

ej 

e)  Calculate  the  density  function  which  results  from  a 
Gauss  Influence  function  [32]  (it  sorts  all  the  nodes  in  descending 
order  according  to  their  CDR  values) 

*  (a*sM2 

cdr,  <-  2>P(-V  ‘‘  ,---)  (11) 

1- 1  2oy 

f)  Normalize  Correlation  Density  Rank  vector  (we  can 
sort  all  the  nodes  in  descending  order  according  to  their  CDR 
values): 

CDR,  -  k  (12) 

/  Zi,.i  i 

g)  Return  CDR. 

Where  f,j  is  the  number  of  frequency  from  node/!,  to  node 
hj .  it'  fjj  =  0 ,  let  C[i  ,j  ]  =  co  or  a  very  big  number.  P”{  is 
the  transition  probability  matrix  that  P"*  is  equal  to  the  rate  of 

f,j  divided  by  sum  of  frequency  between  node  R,  and  all  its 

references  nodes,  k  is  the  number  of  members  in  social 
network  (or  nodes  on  G). 

The  parameters  y  and  p  are  input  values  determined  by 
user,  y  controls  the  effect  of  frequency  on  the  cost  function 

which  restrict  cost  ratio  with  respect  to  our  defined  infinite 
constant.  [3  is  the  influence  of  the  cost  on  the  walker’s  selection 
of  a  path,  and  is  equal  to  inverse  of  temperature  at  Helmholtz 
free  energy  in  thermodynamical  system  [33]. 

Also,  in  step  2,  ds  =  diag(S)  is  the  vector  of  diagonal 
elements  of  S,  and  e  is  the  identity  matrix.  Note  that  A'B  and 
A  B  are  elementwise  product  and  division,  respectively. 

For  calculating  step  2,  we  use  the  easier  way  of  computing 
the  matrix  Z  [34].  The  values  of  CDR,  ( 1  ^  j  ^  k  )  indicate 
the  final  normalized  density  rank  of  members  in  the  complex 


//(/>)=  P  P(x)\ogP{x)dx  (13) 

J-Cf) 


Since  several  time  frequency  representations  can  achieve 
negative  values  the  use  of  the  more  classical  Shannon 
information  as  a  measure  of  complexity  is  prohibited  (due  to 
the  presence  of  the  logarithm  within  the  integral  in  below)  and 
some  authors  [28,  36-38]  have  proposed  the  use  of  a  relaxed 
measure  of  entropy  known  as  the  Renyi  entropy  of  order  a  : 


H*(P)  = 


\-a 


log 


J pa(x  yix 

j>(x)rfx 


(14) 


Following  Baraniuk,  the  passage  from  the  Shannon  entropy 
H  to  the  class  of  Renyi  entropies  H  *  involves  only  the 
relaxation  of  the  mean  value  property  from  an  arithmetic  to  an 
exponential  mean  and  thus  in  practice  H  *  behaves  much  like 
H.  The  Shannon  entropy  can  be  recovered  as 
lim_,  H*  (P)  =  H  (P). 

(15) 

So,  in  order  to  measure  of  the  global  DOD/unpredictibility 
for  each  network,  we  can  employ  the  CDR  vector  as  the 
probability  density  distribution  on  nodes  in  Renyi  entropy 
formulate.  Thus,  for  scoring  each  network  on  each  criteria,  we 
compute  the  follow  measure: 


Hl‘  = 


1  -a 


•og2 


T,:fDR<a 

if:,™. 


(16) 


Where  Hck‘  the  unpredictability  of  network  number  k  on 
the  event  related  to  criterion  C,  and  N  k  is  the  number  of  nodes 
in  network  number  K.  Also,  CDR  vector  is  related  to  given 
network  and  event,  and  a  is  the  order  of  Renyi  entropy  order 
that  we  can  consider  3. 

If  the  density  value  of  each  node  in  complex  network  is  the 
same,  then  the  uncertainty  of  the  original  density  distribution  is 
the  greatest.  On  the  contrary,  while  the  density  value  of  each 
node  in  complex  network  is  very  asymmetrical,  then  the 
uncertainty  of  the  original  density  distribution  is  the  smallest. 


Thus,  the  inverse  of  uncertainty/unpredictability  would  be  a 
good  measure  for  the  DOD  through  network. 

2.2.3.Estimate  the  expected  value  rate  of  each  event  per 
each  networks 

As  mentioned  above,  the  Renyi  entropy  can  reflect  the 
difference  of  nodes’  density  value.  The  more  different  the 
density  values  are,  the  smaller  the  Renyi  entropy  is.  So,  we  can 
consider  inverse  of  Reyni  entropy’s  result  time  the  mean  as  a 
measure  of  the  frequent  event  variance  through  nodes. 

The  normal  (or  Gaussian)  distribution  is  a  very  commonly 
occurring  continuous  probability  distribution — a  function  that 
tells  the  probability  that  an  observation  in  some  context  will 
fall  between  any  two  real  numbers.  Normal  distributions  are 
extremely  important  in  statistics  and  are  often  used  in  the 
natural  and  social  sciences  for  real-valued  random  variables 
whose  distributions  are  not  known  [39].  Furthermore, 
considering  a  Guassian  distribution  with  below  mean  and 
variance  parameters,  can  help  us  to  better  understanding  about 
probability  distribution  of  given  frequent  event  through  A'th 
network. 


compare  between  them,  the  ratio  of  expected  value  is  a  better 
measure  to  evaluate. 

2.3  TOPSIS  algorithm 

Many  ranking  methods  have  been  proposed  to  solve  the 
multiple  criteria  decision  making  (MCDM)  problems,  etc.  One 
of  the  well-known  ranking  methods  for  MCDM,  named  the 
technique  for  order  preference  by  similarity  to  ideal  solution 
(TOPSIS)  [40-45],  is  firstly  proposed  by  Hwang  and  Yoon 
[46].  The  logic  of  tire  TOPSIS  approach  is  to  define  the  ideal 
and  anti-ideal  solutions  [43],  which  are  based  on  the  concept  of 
relative  closeness  in  compliance  with  the  shorter  (longer)  the 
distance  of  alternative  i  to  ideal  (anti-ideal),  the  higher  the 
priority  can  be  ranked  [47].  The  procedure  of  TOPSIS  can 
be  expressed  in  a  series  of  steps: 

(1)  Calculate  the  normalized  decision  matrix  on  column. 
The  normalized  value  ntj  is  calculated  as 


A?- 


(2)  Calculate  the  weighted  normalized  decision  matrix.  In 
tliis  paper,  the  weights  of  objective  criteria,  using  the  entropy 
weighting  method  [47],  will  be  applied  to  the  generalized 
algorithm.  The  weighted  normalized  value  v  is  calculated  as 


± -jtl 

Fk(x, #,%')=— U—e  ***  x>0  (18) 

Oj,'  V 2k 

For  instance,  probability  of  not  occur  the  given  event  on  a 
random  node  in  Ath  network  would  be  result  of  below  equation: 


VJ  ~w JnV’  j~ 


(22) 


Where  wj  is  the  weight  of  the  /th  attribute  or  criterion,  and 

(3)  Determine  the  positive  ideal  and  negative  ideal  solution. 


p(x 


(19) 


Moreover,  the  expected  value  of  event  on  two  random 
nodes  in  networks’  Gaussian  distribution  is  a  good  measure  for 
scoring  networks  on  related  criteria.  Thus,  we  have 

S?=]xFk(x,ffk',W)dx=]— t-p-e  «*'*dx 

o  o  v2zr 

□  □□□□ 


Where  Sk'  is  the  score  of  Alh  network  on  criterion  C,  . 
After  using  this  approach  for  all  networks  and  scoring  them  on 
all  criteria.  Decision  Matrix  is  constructed  to  apply  TOPSIS 
method  for  ranking  networks’  performance.  Note  that  before 
start  TOPSIS’s  steps  normalize  Decision  matrix  on  rows  to 
have  the  ratio  of  expected  value  for  each  networks.  Because, 
networks  may  have  different  amounts  of  global  expected 
values  of  all  event  types  on  a  random  node.  So,  for  fair 
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Where  I  is  associated  with  benefit  criteria,  and  J  is 
associated  with  cost  criteria. 

(4)  Calculate  the  separation  measures,  using  the  m- 
dimensional  Euclidean  distance.  The  separation  of  each 
alternative  from  the  ideal  solution  is  given  as 

rf<+=|Z(vtf-vj)2|  *  »  =1,2,...,ih.  (24) 


Similarly,  the  separation  from  the  negative  ideal  solution  is 
given  as 


1 


(5)  Calculate  the  relative  closeness  to  the  ideal  solution. 
The  relative  closeness  of  the  alternative  A }  with  respect  to  A* 
is  defined  as 

R,  =d~/{d* +d~),  i=\,...,m.  (26) 

Since  d  ~  ^  Oand  d  *  ;>  0 ,  then,  clearly,  R]  e  [0,l]  . 


(6)  Rank  the  preference  order.  For  ranking  networks  using 
this  index,  we  can  rank  networks’  the  relative  closeness  value 
in  decreasing  order. 

The  basic  principle  of  the  TOPSIS  method  is  that  the  chosen 
alternative  should  have  the  “shortest  distance”  from  the 
positive  ideal  solution  and  the  “farthest  distance”  from  the 
negative  ideal  solution.  The  TOPSIS  method  introduces  two 
“reference”  points,  but  it  does  not  consider  the  relative 
importance  of  the  distances  from  these  points. 

3.  Experiments 

In  order  to  implement  our  approach,  we  designed  four 
different  synthetic  architectures  of  computer  network  with 
recording  the  successful  and  failed  type  of  frequencies  as 
positive  and  negative  events  respectively,  during  sample  time 
period  which  are  shown  on  Figure  2.  Networks’  data  can  show 
obviously  our  method  behavior  on  different  situation.  For 
instance.  Network  A  and  B  have  same  number  of  successful 
and  failed  events.  But  the  DOD  of  failed  events  in  network  A  is 
higher  than  network  B  so  that  it  seems  node  8  in  network  A  has 
a  critical  problem  and  probability  of  happening  failed  events 
between  node  8  and  any  other  nodes  is  high. 

After  employing  correlation  density  rank  and  Renyi 
entropy  we  have  found  global  unpredictability  and  variance  of 
events  per  networks,  which  mentioned  in  Table  1.  within  the 
other  general  information. 


Table  1.  Network  properties  and  their  unpredictability,  mean 
_ and  variance  results  by  proposed  method. 


Network 

Number 

Successful  event! 

Fail  events 

Name 

of  nodr\  | 

V tm 

V— “ 

Nice* 

v— “ 

Network  A 

10 

I0Q  L44I0 

10 

1757 

25 

0.01 

15 

!<• 

Network! 

12 

IM  3.8501 

1313 

1732 

25 

1.7552 

1083 

1.184 

Network  C 

1 

W  2JB71 

IS 

15735 

30 

11121 

175 

1.7754 

Network  D 

t 

M  1.4401 

1*447 

4A2733 

10 

0.7735 

I.1III 

1.4002 

Successful  events 

Failed  events 

Network  A 

9 

Network  B 

■  AJT" 

Network  C 

1 

Network  D 

•<>, 

%■/. 

6  8 

Figure  2.  Four  synthetic  architectures  of  computer  network  with  their 
successful  and  failed  frequencies  during  sample  time  period. 


By  having  the  estimated  mean  and  variance  parameters,  we 
can  consider  networks’  probability  distributions  on  both 
successful  and  failed  events  (Figure  3  and  Figure  4).  For 
example  in  Figure  4,  the  junction  of  probability  distribution 
curves  and  Y  axis  indicate  the  probability  of  no  occurring 
failed  event  on  a  random  node  on  related  network  that  in  this 
case  network  D,  B,  C  and  A  respectively  have  descending 
order  of  the  probability  of  no  occurring  failed  event  on  a 
random  node. 


Figure  3.  Estimated  probability  distributions  for  networks  about  successful 
type  of  events. 


4.  Conclusions 


Figure  4.  Estimated  probability  distributions  for  networks  about  failed  type 
of  events. 

In  next  stage,  decision  matrix  constructed  as  shown  in 
Table  2  and  then  using  Entropy  weighting  Method,  weights  of 
criteria  were  computed  (Table  3). 


Table  2.  Decision  matrix. 


Network 

Name 

Criterion  1 
Expected  value  of 
successful  Qy  ent 

Criterion  2 
Expected  value  of 
failed  e\ent 

Network  A 

10.0045 

100.991 

Network  B  8.3387  2.10188 

Network  C 

7.52409 

3.7611 

Network  D  6.822  1.28134 

Table  3.  Results  of  weighting  criteria  by  Entropy  weighting 
method. 


Entropy 

weighting 

method 

Criterion  1 
successful  event 

Criterion  2 
fulled  event 

weight 

0.6425 

0.3575 

Finally,  the  Topsis  Method  helped  us  to  rank  networks 
based  on  these  two  criteria  (Table  4). 

Table  4.  Results  of  ranking  by  TOPSIS 


Network 

Name 

Network 

A 

Network 

B 

Network 

c 

Network 

O 

Rank  value 
by  Topsis 
method 

0 

0.942537 

0.766967 

i 

As  expected,  rank  values  by  TOPSIS  method  displayed  that 
network  D  is  the  best  one  and  networks  B,  C  and  A, 
respectively,  have  smaller  ranks  on  descending  order. 


Ranking  complex  networks  has  a  broad  range  of 
applications,  such  as  Computer/Corporate/Campus  Area 
Network  (CAN),  Telecommunication  Network,  Electrical 
Circuit  Network,  Social  Network,  Supply  Chains,  Financial 
networks  and  etc.  In  this  paper,  we  present  our  research  effort 
in  comparing  between  complex  networks  from  their 
positive/negative  frequency  data  to  obtain  a  ranking  of  them. 
The  proposed  method  is  composed  of  three  main  parts:  (1)  an 
approach  for  estimating  the  DOD  of  event  frequencies  through 
network;  (2)  a  static  framework  to  explore  the  expected  value 
of  each  type  of  events  frequency  on  a  random  node  per 
network  which  is  considered  as  the  score  of  network  on  related 
criterion;  and,  (3)  construct  the  decision  matrix  and  employ  the 
well-known  TOPSIS  method  to  rank  alternative  networks. 
These  algorithms  were  applied  to  several  synthetic  datasets, 
and  produce  good  results.  The  experiments  show  that  the 
framework  can  well  present  the  rank  order  of  networks. 
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ABSTRACT 

High-speed  wireless  services  have  achieved  remarkable  rates  of 
growth  in  recent  years.  In  order  to  survive  in  this  competitive 
market,  high  levels  of  service  performance  are  an  effective  way  to 
improve  customer  satisfaction  and  loyalty.  This  paper  aims  to 
identify  the  best  service  provider  in  a  heterogeneous  wireless 
network  so  that  we  differentiate  the  quality  of  service  (QoS)  and 
provide  a  framework  for  analytical  performance  evaluation.  This 
problem  is  considered  a  ranking  problem  in  Multi-Criteria 
Decision  Making,  and  so  we  formulize  a  novel  method  to 
compute  collaboration  performance  utility  for  each  provider.  The 
compromise  ranking  technique  (called  V1KOR)  is  used  to 
aggregate  all  utility  values  on  alternatives  and  computes  the  best 
level  of  service  among  providers.  The  experimental  evaluation 
results  demonstrate  the  computational  efficacy  of  the  solution 
approaches  and  derive  managerial  insights. 

General  Terms 

Heterogeneous  Wireless  Networks  (HWNs),  Quality  of  Service 
(QoS),  Multi-Criteria  Decision  Making  (MCDM). 

Keywords 

Network  Service  Quality  Rank  (NSQR);  Correlation  Density 
Rank  (CDR);  Diversity  of  Density  (DOD). 

1.  INTRODUCTION 

High-speed  wireless  service  has  achieved  a  remarkable  market 
penetration  in  recent  years  with  numerous  service  providers.  Thus 
much  effort  has  been  concentrated  on  developing  multi-criteria 
radio  access  technology  (RAT)  selection  algorithms  for 
heterogeneous  wireless  networks  (HWNs)  [1]  based  on  criteria 
such  as  bandwidth,  maximum  data  supported,  security  level 
provided,  battery  power  consumption  etc. 

The  contributions  of  this  study  are  as  follows:  (1)  Modeling  the 
suppliers'  performance  comparison  problem  as  a  ranking  problem 
in  Multi-Criteria  Decision  Making  (MCDM)  [2],  (2)  Developing 
an  approach  to  evaluate  the  probability  of  non-occurring  negative 
events  between  two  random  users,  while  positively  discriminating 
the  events  occurring  between  a  user  and  its  significant  partners 
over  those  with  less  significant  affiliates.  (3)  Computing  the  utility 
and  efficiency  of  collaboration  between  each  pair  of  alternative 
resources.  Moreover,  the  tradeoff  between  criteria  can  be  made  by 
the  VIKOR  method  [3], 
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2.  PROPOSED  APPROACH 

We  consider  this  issue  as  a  ranking  problem  in  MCDM.  Various 
attributes  can  be  selected  as  criteria  but,  here,  we  focus  on  negative 
frequent  events  which  may  occur  on  interaction  between  users  of  any 
pair  of  device  or  service  providers  during  a  given  big  enough  period 
of  time,  and  effect  on  network’s  collaboration  performance.  In  order 
to  establish  the  decision  matrix,  scoring  alternatives  on  criteria 
needed.  So,  we  should  assign  to  an  alternative  the  performance 
efficiency  value  for  collaboration  with  other  alternatives.  After  that, 
we  use  VIKOR  method  which  introduces  the  multi  criteria  ranking 
index  based  on  the  measure  of  “closeness”  to  the  “ideal”  solution. 
With  the  purpose  of  scoring  alternatives  on  criteria,  we  proposed  a 
new  approach  which  has  three  main  steps  as  follows  which  repeat  for 
all  failed  type  of  occurrences: 

Step  1:  Arranging  tire  failed  frequency  matrix  of  the  heterogeneous 
network  so  that  all  users  from  same  dev  ice  or  service  provider  set  next 
to  each  other  to  extract  sub-frequency  matrices  related  to 
collaboration  between  each  pairs  of  potential  resource  or  device  types. 

Step  2:  computing  below  items  for  all  sub-frequency  matrices; 

•  The  Correlation  Density  Rank  (CDR)  vector  [4]  which  is 
customized  in  this  study  to  compute  probability  density  of  given 
events  on  homogenous  network’s  users. 

1 .  Initialize  cost  distance  matrix  C. 

c[/j]=iog;;:^))  (n 

(The  logarithm  of  (1  -expf-y/,, ))  based  on  (l  -  p* ) ) 

2.  M  «—  Normalize  matrix  C  on  columns. 

3.  For  each  node  rij(\<  j  <1  )  compute  inverse  of  the 
entropy  of  related  column  from  matrix  M: 

ej~  ,,Ln(M (2) 

1 

vj  « -  (3) 

<7 

4.  Calculate  the  density  function  which  results  from  a 
Gauss  Influence  function. 

edr,  ^£eXp(-t4-)  (4) 

7=1  2(7 J 

5.  Normalize  Correlation  Density  Rank  vector: 


6.  Return  CDR. 
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Unpredictability/  Diversity  of  Density  (DOD)  of  failed 
occurrences.  We  employ  the  CDR  vector  as  the  probability 
density  distribution  in  Renyi  entropy  formulate  [5]. 

/  _ _  \ 
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11  kl 
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l-a 


log2 


Z  *CDR, 


(6) 


Where//")  the  unpredictability  of  die  failed  event  e,  between 
users  from  the  network  service  provider  or  device  type  k  to  /,  and 
N  k  is  the  number  of  users  from  service  provider  or  device  type 
number  K. 


•  The  estimated  probability  of  non-occurring  any  failed 
happening  between  two  random  users  from  given  two 
resource  or  device  types.  Considering  a  Gaussian  distribution 
with  below  mean  and  variance  parameters,  can  help  us  to 
better  understanding  about  probability  distribution  of  given 
frequent  event  through  kl  cooperation  network. 


Step  3:  scoring  service  providers  or  devices  based  on  their 
interaction  utility  and  efficiency  with  other  alternatives  as  criteria, 
and  constructing  decision  matrix  to  apply  VIKOR  method  for 
ranking  alternatives’  performance.  We  follow  a  variant  of  the 
sigmoid  frinctions  to  model  the  service  user’s  satisfaction.  The 
normalized  user  satisfaction  is  modeled  as. 


TJ*'  =  - 
u  kl 


(Ph+Pl, 


1  +e 


-PS) 


(9) 


Where  f/*J  is  the  user  satisfaction  perceived,  and  P t/  is  the 
probability  of  non-occurring  which  is  computed  by  Eq.  (8),  for 
event  e ,  in  interaction  from  alternative  resource  or  device  type  k  to 


/.  P„rio  is  the  minimum  and  Pimx  is  the  maximum  probability 
value  through  all  k  and  /.  To  compute  the  efficiency  of  cooperation 
between  two  resource  or  device  type,  we  proposed  Eq.  (10)  to 
aggregate  utilities  of  reciprocal  interactions  per  pairs  of  options. 


Ee< 

^(kJ) 


1 


1 


1+ logoi,)  1+ logoi) 


W  x  and  W  y  are  the  importance  weight  of  sending  and  receiving 
operation,  respectively,  where  their  summation  is  equal  to  1 . 


This  research  is  funded  in  part  by  the  Army  Research  Laboratory  under 
Grant  No:  W91  INF- 12-2-0067  and  Army  Research  Office  under  Grant 
Number  W91  INF-1 1-1-0168.  Any  opinions,  findings,  conclusions  or 
recommendations  expressed  here  arc  those  of  the  authors)  and  do  not 
necessarily  reflect  the  views  of  the  sponsor. 


3.  EXPERIMENT 

To  implement  our  approach,  we  consider  a  synthetic 
heterogeneous  Communication  Network  with  20  Communication 
Modules,  three  alternative  devices  and  two  types  of  failed  events 
during  sample  time  period.  Assume  a  new  user  wants  to  enter  in 
this  network  and  should  select  one  of  three  types  of  devices.  After 
employing  the  approach  of  scoring  alternatives  based  on  criteria, 
the  decision  matrix  constructed  is  shown  on  Table  1. 

Table  1.  Decision  matrix  resulted  by  the  proposed  method 


Table  2.  Results  of  ranking  by  the  VIKOR  method 


Alternative 

CModule  A 

CModule  B 

(  Module  C 

Distance  to  Ideal 

0.7439 

0.32051 

0.5 

According  to  final  results  of  VIKOR  (Table  2.),  CModule  type  B  is 
closest  one  to  ideal  and  then  type  C  and  A  respectively  are  in 
descending  orders  as  expected. 


4.  CONCLUSIONS 

Ranking  performance  of  resources  in  wireless  networks  has  a 
broad  range  of  applications  such  as  comparison  of  resource  or 
device  performance  in  Computer  Area,  Telecommunication, 
Electrical,  Social,  Supply  Chains,  Financial  Networks  and  etc.  We 
establish  the  MCDM  optimization  model  for  selecting  best 
resource  based  on  their  efficiency  of  collaboration  with  other 
alternatives  about  occurring  negative  frequencies.  The  proposed 
method  is  composed  of  three  main  parts:  (1)  evaluating  the 
probability  of  non-happening  the  negative  events  between  two 
random  users  which  positively  discriminate  the  events  between  a 
user  and  its  important  partners  over  those  with  less  significant 
affiliates.  (2)  Computing  the  utility  and  efficiency  of  collaboration 
between  each  pair  of  alternative  resources;  and,  (3)  construct  the 
decision  matrix  and  employ  the  well-known  VIKOR  method  to 
rank  alternative  resources.  Finally,  these  algorithms  were  applied 
to  synthetic  Communication  network.  The  results  can 
automatically  satisfy  the  requirements  of  QoS  users  preferentially. 
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Abstract — To  date  the  assignment  problems  are  important 
tasks  in  recommender  systems  and  one-to-one  matching  issues 
through  social  environments.  The  various  approaches  have 
been  proposed  to  reach  these  purposes  that  are  normally 
limited  to  the  considerations  of  cost  or  profit  incurred  by  each 
possible  assignment.  However  most  of  the  time,  each  of  the 
alternatives  at  both  assignment  sides  have  particular  criteria 
forjudging  about  the  other  side  alternatives,  whereby  they  can 
evaluate  their  sufficiency.  In  this  paper,  in  order  to  obtain  the 
optimality  of  both  dimensions  of  assignment  we  try  to  consider 
the  concept  of  efficiency  rather  than  the  cost  or  profit  of  each 
possible  assignment.  Therefore,  the  efficient  assignment  is  the 
one  that  firstly,  has  the  maximum  optimality  in  terms  of  both 
dimensions  of  assignment,  and  secondly,  takes  into  account  the 
significance  of  judgment  of  each  assignment  from  the  viewpoint 
of  decision  maker.  To  do  this,  a  compound  index  would  be 
defined  which  includes  the  efficiency  related  to  two- 
dimensional  optimized  assignment  for  the  purpose  of 
measuring  the  performance  of  each  possible  assignment.  Next, 
A  mathematical  programming  model  for  the  extended 
assignment  problem  is  proposed,  which  is  then  expressed  as  a 
classical  integer  linear  programming  model  to  determine  the 
assignments  with  the  maximum  efficiency.  A  numerical 
example  is  used  to  demonstrate  the  approach. 

Keywords—  Assignment  Problem;  Multicriteria  Reciprocal 
Judgments;  Two-dimensional  Utility;  Total  Efficiency  in 
Reciprocal  Optimality;  Virtual  Alternative. 


I.  Introduction 

The  assignment  problem  is  a  common  term  in  the  theory 
of  linear  and  network  flow.  This  problem  has  been  proposed 
in  different  forms  [1]  but  it  is  most  often  considered  in  form 
of  optimal  solution  of  assigning  to’  jobs  to  'n'  people  in  a  way 
that  minimum  cost  or  maximum  profit  would  be  obtained. 
You  can  see  some  of  its  usage  in  [2-6];  in  order  to  find 
effective  and  optimal  solutions,  different  algorithm  including 
standard  linear  programming  [7-12],  Hungarian  algorithm 
[13],  neural  network  [14],  and  genetic  algorithm  [15-19] 
have  been  devised.  For  standard  assignment  problem,  only 
the  cost  or  the  profit  of  each  possible  assignment  are 
considered  in  formulation  of  the  problem;  but  in  real  usage, 
for  each  possible  assignment  several  types  of  input  resources 


are  usually  needed  in  an  assignment  problem.  Moreover, 
decision-makers  can  have  several  different  objectives  to 
achieve  for  each  possible  assignment,  and  the  ways  to 
achieve  these  objectives  may  conflict  with  each  other. 
Cambell  and  Diaby  in  an  article  [20]  pointed  out  that  demand 
levels  in  different  departments  as  well  as  the  number  of 
present  workers  should  be  regarded  as  the  input,  and  the 
assignment  outcomes  can  affect  quality  of  service  and 
employee  satisfaction.  They  also  emphasized  that  effective 
utilizing  of  human  resources  is  of  utmost  significance  in 
sensitive  professions  such  as  nursing. 

Bera  and  Suer  also  claim  that  multiple  factors  can  affect 
the  assigning  of  human  resources  in  the  manufacturing  cell. 
Overall,  different  evaluation  units  could  be  used  to  assess 
performance  measurements  of  the  objectives.  These 
measurements  are  considered  as  the  output  of  the  problem. 
The  problem  can  have  several  incompatible  and  opposing 
input  and  output.  In  this  regard,  in  an  article  [22]  the  author 
has  formulated  a  problem  by  considering  multiple  input  and 
output  for  each  possible  assignment,  and  utilizes  data 
envelopment  analysis  (DEA)  for  measuring  the  efficiency  in 
proposed  approach. 

Chi-Jen  Lin  (2011),  proposes  a  labeling  algorithm  to 
identify  two  other  sensitivity  ranges  -  Type  II  and  Type  III. 
The  algorithm  uses  the  reduced  cost  matrix,  provided  in  the 
final  results  of  most  solution  algorithms  for  AP,  to  determine 
the  Type  II  range  which  reflects  the  stability  of  the  current 
optimal  assignment  [23].  Birger  Raa  et  al.  (2011)  In  [24] 
present  a  MILP  model  for  the  integrated  BAP-CAP  taking 
into  account  vessel  priorities,  preferred  berthing  locations 
and  handling  time  considerations.  Robert  F.  Bordley  & 
Stephen  M.  Pollock  (2012)  in  [25]  used  an  approach  that 
maximizes  organizational  utility  which  is  assumed  to  be  zero 
if  any  of  the  activities  cannot  meet  its  target  (or  resource 
allocation).  In  their  approach,  utility-based  probability 
maximization  (UPM)  is  a  variant  of  stochastic  optimization 
without  recourse. 

The  standard  assignment  problem  is  a  particular  form  of 
the  transportation  problem  and  could  be  formulated  in  a 
linear  integer  programming  of  1-0  [26-27],  as  follows: 
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Min  (or  Max )  (p  =  ^  ^  ctJsiJ 
(=i  i= i 

Subject  to 
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=  1,  i  =  1, ... , n 


J=i 
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^Sy  =  1,  j=  1 . n 

1=1 

Sy  =0  ,  or  1 


(1) 


In  which  the  decision  variable  Sy  =  1  means  that  'i'  til 
individual  is  assigned  to  'j'  th  job,  while  for  Sy  =  0  no 
assignment  is  made.  Cy  is  the  cost  (or  profit)  imposed  by  the 
assignment.  Particular  computer  software  could  easily  be 
used  to  solve  above  formulated  problems  as  well  as  to  find 
the  set  of  optimal  answers  for  identifying  the  minimum  cost 
and  maximum  profit.  But  it  should  be  noted  that  in  this 
formulation,  the  cost  or  profit  is  only  regarded  for  measuring 
the  function  and  as  we  mentioned  earlier,  other  criteria  rather 
than  profit  or  cost  could  be  used  for  measuring  the  function 
of  assignments. 

The  basic  idea  of  performing  this  research  has  been 
derived  from  the  assignment  problem  which  encounters  in 
real  positions  and  is  not  solvable  with  current  methods.  The 
problem  is  that  we  want  to  optimally  assign  some  of 
employees  to  some  jobs  in  a  way  that  each  of  occupations 
needs  some  kind  of  capability  and  eligibility  as  evaluation 
criterion.  Meanwhile,  manager  as  decision  maker  in  order  to 
enhance  sense  of  job  satisfaction  wants  to  take  into  account 
tastes  and  utility  of  employees  in  case  of  each  job. 
Meanwhile,  imposing  each  person’s  taste  and  also 
qualifications  and  capabilities  needed  for  every  job  have 
different  level  of  importance.  Therefore,  we  deal  with 
assignment  problem  of  two  goals:  first,  to  maximize  degree 
of  utility  in  view  of  each  person’s  taste  and  second,  to 
maximize  degree  of  utility  from  the  dimension  of 
qualification  and  competency  needed  for  each  occupation 
according  to  the  priority  of  each  items.  As  another  example 
we  can  consider  a  coach  as  a  decision  maker  who  intends  to 
divide  his/her  students  into  different  teams  in  different  sports 
with  limited  space;  in  this  decision  making  process  he  should 
take  into  account  the  qualifications  and  capabilities  required 
for  each  sport  area  as  well  as  the  taste  of  the  individuals  so 
that  the  teams  could  have  the  required  conditions  for  success. 

Therefore,  in  this  study,  the  maximum  of  the  total 
efficiency  in  obtaining  the  optimality  of  both  dimensions  of 
assignments  would  be  considered  as  the  criterion  of  optimal 
assignment  according  which  this  study  is  organized  and  you 
could  see  what  will  come  next  in  this  paper.  Part  two  would 
discuss  about  the  overall  structure  of  the  model  and  would 
provide  a  definition  of  the  problem.  Part  three  put  forward  an 
approach  for  solving  the  problem  and  finding  the  optimal 
answer.  Part  four  presents  an  example  to  better  explain  the 
approach,  and  finally  part  five  deals  with  the  conclusion  of 
the  study. 

II.  The  overall  structure  of  problem 

Among  the  basic  concepts  required  for  elaborating  the 
model  of  the  problem,  are  the  three  concepts  of  'alternative 
role',  'arbiter  role'  and  'decision  maker  role'.  When  an  element 
has  the  role  of  an  arbiter,  it  means  that  it  has  some  criteria  for 


measurement  and  can  assess  and  order  the  opposite 
alternatives.  The  element  that  is  being  judged  has  the  role  of 
an  alternative.  The  element  which  directly  utilizes  the 
assessments  and  judgments  to  the  final  solution  has  the  role 
of  a  decision  maker.  Therefore,  the  element  that  has  the  role 
of  a  decision  maker  has  definitely  the  role  of  an  arbiter,  but 
the  element  with  the  role  of  an  arbiter  does  not  necessarily 
have  the  role  of  a  decision  maker. 

Here,  we  consider  the  decision  making  system  as 
consisting  of  three  distinct  types  of  elements  (a  component  of 
the  decision  making  system  called  "element"  that  could 
accept  one  or  more  role  of  the  tree  role  of  "alternative", 
"arbiter"  or  "decision  maker").  Both  the  elements  of  X,  Y 
have  the  roles  of  'alternative'  and  'arbiter'  reciprocally,  and 
the  third  element,  that  is  Decision  Maker  (DM),  has  the  role 
of  a  decision  maker  which  is  the  one  responsible  for  doing 
the  assignment  task  (See  Fig  .1).  We  assume  to  have  V 
elements  of  the  type,  each  of  them  are  shown  as  Xt,  i  = 
1,2, ... ,  k  ;  on  the  other  side  we  have  1'  elements  of  the  y 
type  that  each  of  them  are  shown  as  kj,  j  =  1,2, ....  I  (k  < 
l).Cx  —  {c*,c*,  is  the  set  of  the  references  of  the 

attributes  related  to  the  assessment  of  Xs  and  CY  = 
{cj ,  cjf, ....  Cs  }  is  the  set  of  the  references  of  attributes  related 
to  the  assessment  of  Ys.  In  this  problem  each  element  Xj,  i  = 
1,2, ... ,  A:  takes  into  account  some  attributes  of  CY  as  the 
criterion  of  assessment  and  judgment  about  all  Vjs,  and  also 
each  of  /■,/  =  1,2,...,!  has  considered  a  subset  of  Cx 
attributes  for  the  sake  of  measurement  and  judgment  about 
all  XjS.  Now  DM  is  the  one  that  makes  decision  about  the 
assignment  of  elements  of  the  Y  type  to  the  elements  of  the 
X  type  and  intends  to  perform  the  assignment  in  a  way  that 
the  maximum  optimality  is  obtained  observing  the  criteria  of 
the  elements  of  the  both  sides.  It  should  be  noted  that  each 
element  Yt  could  only  be  assigned  to  one  element  X\  and  the 
assignment  capacity  for  each  Xu  i  =  1,2, ... ,  k  equals  the 
number  of  P\  (Pj  is  Natural  number  and  £?=i  P,-  <  l ). 


Fig.  1.  The  structure  of  assignment  model  based  on  multicriteria  reciprocal 
judgments 

In  addition,  DM  may  attach  different  significance  to 
judgment  of  the  elements  X  and  Y,  therefore,  Wx  is  the 
significance  weight  of  the  elements  of  the  X  type  and  WY  is 
the  significance  weight  of  elements  of  the  Y  type  and 
accordingly  Wx  +  WY  =  1.  Also  among  the  elements  of  X 
type,  the  DM  may  attach  different  importance  to  Xts  in  which 
case  wx  ,  i  =  1,2,  ...,k  is  the  significance  weight  of  element 
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Xt  in  terms  of  DM  in  a  way  that  £f=1  wX|  =  1.  In  the  same 
vein,  about  elements  of  Y  type  wy.  ,  j  =  1,2,..., I  is  the 
significance  weight  of  element  Vj  in  terms  of  DM  so  that 
Zi=lWK,  =1. 

Because  of  the  reciprocity  of  alternative  role  in  this 
decision  making  model,  the  set  of  alternatives  to  be 
considered  in  this  problem  is  a  two  dimensional  set  which 
can  be  viewed  as  a  set  of  virtual  alternatives  that  are  ordered 
pairs  and  each  of  their  component  is  related  to  each  side  of 
the  assignment.  Therefore  the  set  of  X  x  Y  would  be  the  set 
of  the  alternatives  to  be  considered  and  is  represented  as 
follows: 

A  =  X  xY  =  {(*„  Yj)\XteX ,  Yj€Y}  (2) 

Each  alternatives  of  (Xj.ly)  from  the  set  of  A  (/  = 

1.2 .. ..,k  3  —1,2,...,  I  )  is  interpreted  as  the  assignment 

of  element  kj  to  the  element  Xt  .  With  such  a  definition,  we 
are  dealing  with  a  problem  of  multi-criteria  decision  analysis 
which  has  /  x  k  alternatives  and  one  decision  maker  (DM).  It 
should  be  noted  that  each  of  the  Xt,  i  =  1,2, ... ,  k  and  each 
of  the  Vj  ,  7  =  1,2,...,/  could  be  arbiter  only  about 
alternatives  in  which  one  of  their  components  is  included.  In 
order  to  simplify  the  issue,  we  define  some  restrictions  of  set 
of  A  as  follows: 

AXl  :={(Y(,y/)|y'yel'}  , 

Wj  Ix^x) 

So  with  this  definition  we  can  say  that  each  element  X| , 
t  =  1,2,  ...,k  has  the  role  of  arbiter  only  toward  the  virtual 
alternatives  of  set  AX|,  as  well  as  each  of  the  element  Yj ,  y  = 

1.2.. ..,/  only  toward  the  virtual  alternatives  of  set  AYj',  but 
DM  is  the  element  that  has  the  role  of  arbiter  and  decision 
maker  toward  all  elements  of  set  A.  Now,  we  try  to  find  an 
algorithm  to  solve  the  problem  whereby  we  could  obtain  the 
best  assignment  with  maximum  optimality  in  terms  of 
elements  of  X  and  Y  type. 

III.  Problem  solving  approach 

Here  we  are  dealing  with  an  assignment  problem  in 
which  decision  maker  intends  to  process  the  assignment  in  a 
way  that  the  maximum  optimality  could  be  obtained  in  terms 
of  both  sides  of  the  assignment.  Regarding  this,  first  the 
procedure  of  ranking  which  is  frequently  used  in  this 
algorithm  would  be  defined  and  notated. 

A.  The  ranking  procedure: 

The  purpose  of  utilizing  the  ranking  procedure  is  to 
recognize  the  criteria,  value  functions  and  the  mental  ideal 
point  of  the  decision  maker  on  the  criteria  and  to  rank  the 
alternatives  by  measuring  the  preferable  distance  of  each 
alternative  from  the  ideal  point,  so  that  in  terms  of  the 
preference  amount,  the  closest  alternative  to  the  ideal  point 
would  gain  the  first  rank,  and  in  the  same  way,  the  remaining 
alternatives  would  obtain  the  next  ranks.  The  symbol  of  this 
procedure  is  written  as  rank  .(*,*).  As  an  example  we  could 
assume  that  the  element  b  is  the  arbiter  and  the  set  A  = 
{«!,  a2, ... ,  ap }  is  the  set  of  to-be-considered  alternatives.  The 
set  U  =  {uuu2,  ...,uq}  is  also  the  reference  set  of  the 
criteria.  Therefore,  rank  b(A,  U )  is  the  ranking  of  the  set  of 
alternatives  A  by  the  arbiter  b  which  is  based  on  the  arbitrary 


criterion  of  the  arbiter  among  the  criteria  of  the  reference  set 
U  which  is  done  through  these  procedures: 


Stepl.  Choosing  the  criteria:  the  arbiter  would  be  asked 
to  choose  a  subset  of  arbitrary  criteria  based  on  which  he 
wishes  to  do  the  ranking  from  the  reference  set  U;  the  set  of 
chosen  criteria  is  called  C. 

C  =  {c„c2, ...,cn)  £  U  ;  |C|  =  n  (4) 

Step2.  Giving  weight  to  the  chosen  criteria:  in  this  step 
we  can  directly  ask  the  arbiter  to  provide  us  with  the  weight 
of  the  criteria  and  if  not  possible  we  can  calculate  the  weights 
of  the  criteria  through  one  of  the  common  ways  of  weight¬ 
giving  to  match  in  the  following  conditions. 

n 

£w,  =  l  ,  w,  >0  ;  W  =  (w1# w2,...,ivn)  (5) 

i=i 

Step3.  Identifying  the  value  function  related  to  each 
criterion  by  the  arbiter:  in  this  phase  the  arbiter  would  be 
asked  to  identify  the  mental  value  Junction  in  respect  to  each 
criterion.  In  these  (unctions,  the  horizontal  axe  represents  the 
value  of  outcomes  in  intended  criterion,  and  the  vertical  axe 
is  related  to  the  value  that  those  outcomes  have  for  the 
arbiter.  Here,  we  define  3  aspiration  levels  for  the  value  size 
and  we  ask  the  arbiter  to  identify  the  value  size  related  to 
outcomes  of  each  criterion  based  on  these  levels.  These 
levels  are  as  follows:  1.  "quite  dissatisfaction"  which  has  the 
zero  value.  2.  "quite  satisfaction"  which  has  the  value  of  one. 
3.  "quite  surprised "  which  has  the  value  of  two. 

If  the  outcome  of  a  criterion  is  quite  satisfactory  for  the 
arbiter,  we  give  value  1  to  that  outcome  in  the  vertical  axe, 
and  in  the  same  vein,  for  each  outcome  based  on  the  relative 
satisfaction  it  creates  for  the  arbiter,  we  assign  values  equal, 
smaller  or  larger  than  1.  The  smaller  the  value  is  than  1,  the 
more  arbiter  would  be  dissatisfaction;  and  the  more  it  is  than 
1,  the  arbiter  would  be  more  Surprised.  In  fact  the  range  of 
the  value  function  would  be  between  zero  to  two  in  which  1 
indicates  the  quite  satisfaction  and  1  to  2  represents  that 
arbiter  is  Surprised.  As  an  example,  the  value  function  could 
be  as  follows: 


Fig.  2.  Three  instances  of  identified  value  function  by  the  arbiter. 

The  value  function  of  Vj  is  a  converter  that  transforms  the 
outcome  value  obtained  from  the  alternatives  Oj,  j  = 

1,2,. ...g  on  the  c,  criterion  to  the  value  defined  by  the 
arbiter. 

v!  =  Fi  (c<(fy))  =  vi(ai)  ;  '  = 

1,2,  ...,n  ;  j  =  1,2, ... ,  g 
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We  define  the  vector  ‘V’  that  has  h'  components  as  the 
vector  of  value  functions  and  in  each  of  its  component  we  put 
the  value  function  related  to  one  of  these  criteria. 

vi  =  (rl.vl . vi)  = 

(»i(a,).i>2(a/) . vn(ay));  j  = 

1,2 . g 

Note:  in  the  aggregation  model,  in  order  to  obtain  the 
whole  preferences  of  the  arbiter  or  decision  maker  on  the 
alternatives,  the  assessment  criterion  for  each  alternative  is 
considered  as  a  function  of  value  functions  and  the  vector  of 
criteria  weight  and  based  on  the  values  obtained  from  this 
aggregation  function  would  be  ranked  in  descending  order 
that  is  the  alternative  that  gain  the  highest  value  in  the 
aggregation  function  would  get  the  rank  1  and  the  others 
would  be  ranked  based  on  the  same  vein.  But  get  the 
aggregation  function  is  very  difficult  because  autonomy  and 
dependency  status  should  be  among  the  criteria  considered. 
Sometimes,  considering  all  these  relations  will  not  be 
practical.  But  this  model  is  proposed  a  method  that  to  obtain 
ranking  and  of  aggregation  function  is  not  used. 


functions  either  in  terms  of  the  number  of  the  components  or 
in  terms  of  the  value  of  each  of  the  coordinate’s  components. 
Consequently  we  can  explain  the  n-dimension  value 
functions  space  of  arbiter  V  as  the  "n-dimensional  space  of 
V  idealization". 

Since  the  range  of  the  value  functions  of  Vj,  i  =  1,2,  ...,n 
is  between  zero  and  2,  the  ideal  point  could  be  considered  as 
a  point  of  the  value  functions  space  in  which  all  the 
components  of  coordinate  is  2,  that  is  the  alternative  that  has 
obtained  the  highest  value  in  view  of  the  arbiter  and  would 
be  represented  as  a'  =  (2,2,...  ,2).  In  the  similar  vein  the 
negative  ideal  point  is  the  one  that  has  obtained  the  lowest 
value  in  view  of  the  arbiter,  so  all  the  components  of  the 
coordinate  equals  zero  and  is  displayed  as  a"  =  (0,0, ...  ,0).  it 
should  be  noted  that,  here  we  assume  a”  has  never  been  a 
member  of  the  alternatives  to  be  considered,  for  the  simple 
reason  that  the  occurrence  of  such  a  phenomenon  i.e., 
existence  of  such  an  alternative  that  in  view  of  all  criteria  has 
the  absolute  zero  value  is  quite  rare  and  almost  impossible.  If 
by  chance  such  an  alternative  exists,  it  could  be  removed 
from  the  set  of  to- be -considered  alternatives  from  the  very 
outset. 


Step4.  Formation  of  n-dimension  space  with  value 
functions  and  identification  of  each  alternative  at  (j  = 
1,2 as  a  point  with  the  coordinates  of  W:  we  assume 
to  show  each  of  the  alternatives  of  A  =  {aj,a2,  ...,aa}  with 
n-component  vector  so  that  the  i’  th  component  related  to  a; 
(i  =  1,2,  ...,n;  j  =  1,2,  ...,g)  is  the  outcome  of  alternative 
a;  in  the  i'  th  criterion.  In  this  way  we  could  consider  the 
alternatives  as  points  in  n-dimension  space  of  criteria. 
Therefore,  each  alternative  could  be  displayed  with  his  value 
functions  vector  that  is  alternatives  could  be  considered  as 
points  in  the  n-dimension  space  of  value  functions.  As  an 
exemplary  assumption  take  n=3  that  mean  we  have  3  criteria 
so  the  3  dimension  of  criteria  and  the  3-dimension  space  of 
value  functions  is  as  follows: 


Fig.  3.  Definition  of  the  alternatives  as  the  points  in  the  space  of  criteria 
and  its  transference  to  the  value  functions  space. 

Therefore,  we  would  consequently  have  n-dimension 
space  that  each  of  its  dimensions  is  the  identifier  of  value 
function  related  to  one  of  the  n-criterion  of  the  arbiter;  and 
each  alternative  ay  (J  =  1,2 ,...,g)  in  this  space  has  the  n- 
component  coordinates  that  could  be  considered  as  a  spot 
(point)  of  this  space  in  a  way  that  the  i'  th  component  of  each 
coordinate  of  alternative  equals  the  value,  the  outcome  of 
which  is  obtained  in  terms  of  the  i'  th  criterion  of  arbiter  V. 
The  point  to  be  noted  is  that  the  value  functions  space  and  the 
coordinates  of  an  alternative  in  this  space  is  strictly 
dependent  on  the  idealizations  of  the  arbiter  b',  since  the 
criteria  are  selected  by  the  arbiter  as  well  as  identification  of 
value  functions.  Therefore,  a  particular  alternative  may  have 
quite  different  coordinates  in  the  space  of  arbiters'  value 


Step5.  Calculating  the  closeness  of  relational  preference 
of  alternatives  to  the  ideal  point  and  their  ranking  based  on 
this  index:  in  this  step,  we  obtain  the  Euclid  distance, 
between  the  identifier  point  of  each  alternative  in  value 
functions  space,  from  the  two  ideal  point  and  negative  ideal 
point  as  follows: 


j  =  1.2 . g 


(8) 


Sf  =  J  £?=1i 


j  =  1.2 . g 


(9) 


In  which  for  Sj',  j  =  1,2, ...,g  consists  of  the  Euclid 
distance  of  alternative  a,  from  the  ideal  point  and  Sj~,  the 
Euclid  distance  of  alternative  a;-  from  the  ideal  negative  point 
in  the  value  functions  space.  Now,  in  order  to  rank  the  set  of 
alternatives  A,  we  define  an  index  termed  as  "closeness  of 
relational  preference  to  the  ideal  point"  as  what  you  could  see 
below: 


RPCb=FlTF  'j=  l'2 . 8  (10) 

b,  +  b, 

The  RPC*1  is  the  indicator  of  the  closeness  of  relational 
preference  of  alternative  ay  to  the  ideal  point  of  a’  in 
idealization  space  of  the  arbiter  b.  if  ay  =  a-,  RPC^'  equals  1 
and  when  ay  =  a-  it  equals  zero,  but  since  we  assume  that  a" 
is  not  a  member  of  to-be-considered  alternatives  of  A,  always 
we  have  0  <  RPC^'  <  1  (j  =  1,2,  ...,g).  the  higher  is  the 
index  for  one  alternative,  the  closer  the  alternative  is  to  the 
ideal  in  terms  of  the  preferences  of  arbiter  element  V.  and  at 
the  same  time  it  is  farther  from  the  negative  deal.  Finally  we 
order  and  rank  the  alternatives  of  set  A,  in  descending  order, 
from  the  highest  proximity  of  relational  preference  to  the 
ideal  point,  to  its  lowest  proximity. 
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Now  for  solving  this  problem  and  obtaining  the  most 
appropriate  assignment,  we  suggest  the  following  phases: 

Phase  1 :  we  utilize  the  ranking  procedure  for  every  single 
elements  of  X  and  Y  type  in  arbiter  position: 

vx,.f=i rankx  (K,  Cy)  , 

(11) 

rankYy(X,Cx) 

Phase  2:  in  this  phase  we  form  the  decision  matrix  of 
problem  by  applying  the  result  obtained  from  the  first  phase 
as  follows.  As  we  know,  we  are  dealing  with  a  reciprocal 
judgment  and  each  assignment  of  a  Y  type  element  to  a  X 
type  element  form  a  to-be-considered  alternative  which  are 
shown  as  (Xf,  Yj),  i  =  l,2,...,k  and  j  =  l,2,...,l.  h 
addition,  we  want  to  assess  each  of  these  alternatives  as  tow 
attributes  the  first  attribute  (Ux)  is  the  amount  of  relative 
utility  of  this  assignment  in  terms  of  X  type  element,  and  the 
second  attribute  (UY)  is  the  amount  of  the  relative  utility  of 
this  assignment  in  terms  of  Y  type  element.  We  show  the  set 
of  these  two  attributes  as  UXY  =  {Ux,  UY}.  Therefore,  the 
decision  matrix  structure  would  be  defined  as  follows: 


u* 

uv 

Wi.t'i) 

Uf, 

ur, 

(*<•>'/) 

i  »v 

7  ~ 

ur, 

Fig.  4.  The  structure  of  decision  matrix  in  assignment  model  based  on  the 
multi-criteria  reciprocal  judgments. 

In  which  the  Uy  (/  =  1,2,  ...,k  and  j  =  1,2 . /  )  is  the 

result  or  the  outcome  of  the  judgment  of  X  type  element 
about  the  assignment  (X,,  Yj)  and  equals  the  amount  of 
relative  utility  of  element  Yj  in  terms  of  element  Xh  also  Uy, 
(/  =  1,2, ....  A:  and  j  =  1,2,...,/  )  is  the  result  or  outcome  of 
the  judgment  of  Y  type  element  about  the  assignment  ( X( ,  Yj) 
and  equals  the  amount  of  relative  utility  of  element  X,  in 
view  of  element  Yj.  Now  the  question  is  that  how  are  these 
outcomes  obtained?  For  each  i  =  1,2,  ...,k  and  j  =  1,2, ...,/ 
we  define  the  outcomes  of  decision  matrix  in  following  way: 


Eij  derivative  in  relation  to  Uy  or  Uy  is  a  constant  value, 
which  is  the  ratio  of  the  total  efficiency  changes  to  the 
relative  utility  changes  of  the  alternative  is  a  constant  value. 
While,  commonly  the  closer  die  amount  of  relative  utility  is 
to  1,  and  the  alternative  has  higher  level  of  satisfaction,  the 
less  the  sensitivity  would  be  toward  the  optimality  changes. 
For  this  reason,  we  should  define  index  fry  in  a  way  that  it 
owns  this  characteristic.  Here  we  define  the  index  fry  for  the 
alternative  (X,-,  Yj),  i  =  1,2,  ...,k  and  j  =  l,2,...,l  in 
following  way: 

1  1 

^:"l  +  logH,yUx  +  l  +  loglVxUY  (13) 

In  which  the  first  sentence  shows  the  efficiency  of 
(Xi.Yj)  in  X  dimension  and  the  second  sentence  shows  the 
efficiency  in  Y  dimension.  Also  it  is  clear  that  0  <  fry  <  2. 

The  reasons  that  confirm  the  appropriateness  of  the 
above  definition  for  V 

1.  Function  x  (0<b<l)  is  shown  in  picture5.  Simply 

we  could  see  that  the  derivative  of  this  function  is 
positive  (ascendant)  and  its  second  derivative  is 
negative.  In  fact  die  more  we  get  closer  from  x=l  to  x=0 
the  slope  of  the  graph  slowly  become  lower  which  is 
compatible  with  what  is  in  die  mind  of  the  decision 
maker.  Because  the  more  the  relative  utility  is  and  closer 
to  1,  die  less  the  sensitivity  of  the  decision  maker  is 
toward  the  optimality  changes,  in  other  words  when  the 
relative  utility  of  an  alternative  gets  higher  the  speed  of 
the  efficiency  changes  becomes  lower. 

2.  This  definition  maintains  the  order  and  density  of  the 

preferences  properly,  and  calculates  the  efficiency  size 
with  regard  to  the  significance  weight  related  to  the 
judgment  of  each  dimension  based  on  the  relational 
preferences. 

•  Without  disturbing  the  totality  of  the  problem,  we 
consider  the  statement  related  to  the  efficiency  from 

the  dimension  of  X  (  — - — - — y),  if  we  consider  the 

1+IOglVj,  Uy 

sentence  related  to  the  efficiency  from  the  Y 
dimension  the  change  procedure  would  be  the  same. 
Therefore,  the  consideration  of  one  of  these  two  is 
sufficient. 


u ?,  :=wXl  RP(%  ;  Uy  :=  Wy^  •  RPCyj  (12) 

So  for  each  i  =  1,2, ... ,  k  and  j  =  1,2,...,/  We  have 
0  <  Ufj  ,  Uy  <  1. 

Phase  3:  Now  we  define  an  index  that  could  be  used  as 
the  decision  criteria  in  solving  problem.  This  index  is  called 
"total  efficiency  in  reciprocal  optimality”  and  for  each 

alternative  (X„Vy),  i  =  1,2 . k  and  j  =  1,2,...,/  ,we  show 

it  with  fry .  We  could  consider  this  index  as  a  linear 
combination  of  Uy  ,Uy,  that  is  if  the  decision  maker  (DM) 
give  the  weight  Wx  to  the  X  type  element  judgment  and  give 
WY  to  the  judgment  of  Y  type  element  so  that  W%  +  Wy  =  1, 
0  <  Wx ,  WY  <  1,  then  the  linear  combination  of  fry  :=  Wx  ■ 
Uy  +  Wy  ■  Uy  could  be  considered  as  an  index  for 
measuring  the  "total  efficiency  in  reciprocal  optimality". 
However,  the  point  to  be  noted  is  that  in  this  definition,  the 


•  We  assume  Uy  to  be  constant  and  increase  the  WY\ 
consequently,  as  shown  in  Fig.5,  the  value  of 
— - — - — ~~y  would  decrease;  in  fact  the  efficiency 

l+l°glVyUy  J 

would  decrease  from  the  X  dimension.  In  other  word, 
if  we  assume  the  amount  of  relative  utility  of 
assumed  alternative  (assignment  of  Yj  to  X,  )  in  view 
of  Xj  judgment  as  constant  and  increase  the  amount  of 
WY,  in  fact  we  have  decreased  the  significance  of 
judgment  in  X  dimension,  because  Wx  =  1  —  WY  and 
it  is  normal  that  the  efficiency  get  decreased  in  X 
dimension.  Also  if  we  decrease  WY,  consequently,  the 
Wx  get  increased  and  the  value  of  - - - y  would 

1  +  IOgWyUjj 

increase  and  in  fact  the  efficiency  would  increase 
from  x  dimension. 
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•  Now,  we  keep  WY  as  a  constant  and  distinct  value  and 
increase  the  amount  of  Ufy.  As  in  Fig.  5,  we  will  see 
that  — ■ — - — 7  increases  in  parallel  with  increase  in 

l+IOgWyUfy  v 

U ij.  That  is  if  we  assume  WY  as  constant  value  (in 
fact,  the  significance  weight  related  to  both  dimension 
is  assumed  to  be  constant  and  specified),  based  on  the 
graph  in  Fig.  5,  the  more  the  value  of  relative  utility 
of  alternative  is  in  view  of  X,  judgment,  the  more  the 
efficiency  value  would  get  from  the  X  dimension.  The 
reverse  also  turns  out  to  be  true,  that  is  the  decrease  of 
U,*  for  one  alternative  leads  to  decrease  its  efficiency 
in  view  of  X  dimension.  However  it  should  be  noted 
that  the  slope  of  efficiency  changes  would  decrease 
with  an  increase  in  relative  utility  value  and  it  is 
exactly  what  happens  in  the  decision  maker  mind, 
because  with  increase  in  the  relative  utility  level,  the 
sensitivity  of  the  decision  maker  about  these  changes 
would  decrease  and  this  feature  is  well  considered  in 
the  definition  of  efficiency. 

•  If  U fj  and  Ufy  have  the  same  value  and  Wx  >  WY, 
then  based  on  the  assumed  definition,  the  efficiency 
of  the  alternative  from  X  dimension  would  get  higher 
than  the  efficiency  of  the  alternative  front  the  Y 
dimension. 

•  If  Wx  =  WY,  meaning  that  the  significance  of 
judgment  of  X  and  Y  dimension  is  the  same  in 
decision  making  and  Ujy  >  Ujy,  then  the  based  on  the 
definition,  the  efficiency  from  X  dimension  would  be 
higher  than  the  efficiency  from  the  Y  dimension. 


Y 

-  1 

*  1 4108b,  X 

-  v-  1 

c 

l  +  logbjX 

^  y 
i 

Fig.  5.  The  graph  of  ^  vt hen  0  <  b,  <  b2  <  1 


It  should  be  noted  that,  here,  we  aren’t  looking  for  the 
numerical  value  of  efficiency  index,  but  what  is  important  is 
that  this  index  could  properly  identify  the  total  preference 
order  based  on  the  two  dimension  optimality  of  the  to-be- 
considered  alternatives,  and  since  the  behavior  of  defined 
formulation  for  Ey  is  all  the  time  in  consistent  with  reality  of 
decision  maker  mentality,  it  seems  that  this  definition  is  more 
appropriate  and  efficient  than  he  basic  definition  (linear 
compound).  So  in  this  phase  for  each  assignment Vj),  i  = 

1,2 . k  and  =  1,2,..., I  ,  the  numerical  value  of  Ey  would 

be  calculated  based  on  the  defined  formulation. 

Phase  4:  In  target  function  of  assignment  problem,  in 
order  to  identify  the  maximum  of  2-dimension  optimality 
measurement,  the  values  of  Ey  would  be  utilized  and  the 
problem  would  be  formulated  in  following  way: 


k  l 

max  ^ = n  n  E‘jsti 

i= 1  j=l 

Subject  to : 


Sy  =  0  or  1 


In  which  the  s(y  =  1,  that  is  the  assignment  of  element  Vj 
to  the  element  Xt  is  done  through  DM  while  for  Sy  =  0  no 
assignment  has  taken  place.  Since  0  <  Ey  <  2  and  Sy  = 
0  or  1,  we  could  conclude  that  0  <  <p.  Therefore,  by 
calculating  the  logarithme  from  the  target  function  of  <£,  we 
could  convert  the  above  non-linear  problem  to  the  following 
linear  programming  problem. 


MAX  \p  =  \og<p  = 
Subject  to : 


Sy  =  0  or  1 


K  I 

2ZS"'Iog£" 


(=1 1=1 


,i  =  1,2 . k 

.j  =  1.2 . / 


(15) 


The  above  linear  programming  would  certainly  have  one 
optimal  answer,  and  that  answer  would  indicate  the  optimal 
assignment  of  the  set  of  Y  type  element  to  the  set  of  X  type 
element  by  DM  and  with  considering  the  utility  of  both 
decision  dimensions. 


IV.  Numerical  example 

Here  we  provide  a  real  experiment  about  the  assignment 
of  job  positions  to  individuals  in  order  to  demonstrate  the 
applicability  of  the  suggested  approach  in  which  a 
corporation  manager,  as  a  decision  maker,  requires  the 
decision  analysis  techniques  for  assigning  three  employees 
(K„  Y2,  Y3)  to  two  jobs  of  store  management  (Xj)  and  finance 
manager  (X2)  so  that  by  considering  the  utility  and 
employees'  interest  and  also  the  required  competencies  of 
each  job,  decide  on  the  best  assignment  in  a  way  that  the 
utility  of  both  sides  is  being  satisfied  as  far  as  possible.  Here 
each  job  is  being  occupied  by  one  person  and  the  weight 
factor  which  the  manager  allocates  for  assignment  system 
element  is  as  follows: 


Wx  =  0.7  ,  WY  =  0.3 
WYl  =  0.4  ,  WY2  =  0.4  ,  WY3  =  0.2 
WXj=  0.6  .  Wg2  =  0.4 

In  order  to  extract  the  required  data,  we  have  designed 
some  simple  question  forms,  which  were  customized  to 
estimate  alternative’s  score  on  each  criterion.  In  these  forms, 
the  person  had  selected  a  number  in  range  1  to  5  for  each 
question  to  demonstrate  his/her  preference,  and  total  average 
number  was  final  score,  which  is  converted  in  range  o  to  1. 
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TABLE  L  Introduction  of  thesetofcriteriaC* 


TABLE  X.  Decision  matrix 


c* 

cf 

Cf 

cf 

C  f 

criteria 

Salary  and 
benefits 

Responsibility 

amount 

Nature  of 
work 

Popularity  in 
that  department 

TABLE  n.  Introduction  of  the  set  of  criteria  Cr 


C1 

cf 

r*' 

cf 

Cl 

criteria 

Management 

power 

Job-related 

education 

Job-related 

experience 

Public 

Relations 

TABLE  in.  weight-oivtno  to  criteria  for  judgment  about  Ys  in 

VIEW  OF  Xs 


cr 

c( 

cr 

X,  f04 

0.2 

0.3 

0.1 

X2  0.4 

0.1 

0.4 

0.1 

TABLE  IV.  WEIGHT-GIVING  TO  CRITERIA  FOR  JUDGMENT  ABOUT  XS  IN 
VIEW  OF  YS 


cr 

C f 

cf 

c* 

*-4 

K,  1  0.5 

0.3 

0.2 

- 

kH  0.4 

0.2 

0.2 

0.1 

Y,  0.3 

- 

0.2 

0.5 

TABLE  V. 

The  VALUES  OF  VJS  ON  criteria  in  view  of  X, 

c[ 

cr 

cr 

cr 

h  rr 

0.7 

0.8 

0.1 

K2  0.2 

0.3 

0.2 

0.1 

_ QJ 

0.4 

0.3 

0.5 

TABLE  VI.  THE  VALUES  OF  l^S  ON  CRITERIA  IN  VIEW  OF  X2 


cr 

cr 

cr 

cr 

y,  fi 

0.2 

0.4 

0.1 

Ki  0.2 

0.5 

1 

0.1 

_ 2,2 _ 

0.4 

0.5 

0.5 

TABLE  vn.  The  values  of  X,s  on  criteria  in  view  of  T, 


Cf 

cf 

cf 

0.8 

0.4 

0.7 

_*2_ 

l 

0.3 

0.3 

uXY 

ux 

uv 

(X„K,) 

0.223 

0.134 

(*„k2) 

0.062 

0.096 

(x,.k3) 

0.121 

0.032 

(x2.y,) 

0.128 

0.138 

( x,.Y7 ) 

0.122 

0.188 

(x2.k3) 

0.078 

0.076 

TABLE  XI.  The  total  efficiency  in  reciprocal  optimality  for 
each  possible  assignment 


^^^Index 

log  Ell 

(X..P,) 

0.596 

-  0.225 

(X,.P2) 

0.434 

-  0.362 

(X,.P3) 

0.457 

-0.34 

(X2.K,) 

0.522 

-0.282 

(x2.k2) 

0.54 

-0.268 

(x2.k3) 

0.442 

-  0.354 

At  this  stage,  we  solve  the  following  linear  programming 
to  achieve  the  assignment  with  maximum  efficiency. 

MAX  t/>  =log  <p  =  -0.225  Sn  -  0.362  5l2  -  0.34  5,3 
-  0.282  S21  -  0.268  S22  -  0.3  54  S23 
Subject  to: 

■^11  ^12  +  $13  =  1 


1+* 

s22  +. 

^23  — 

^11 

+ 

$21 

<  1 

•^12 

+ 

S22 

<  1 

•^13 

+ 

$23 

<  1 

Su 

0  orl 

The  optimal  answer  of  this  linear  programming  is:  S’  = 
(1,0,0, 0,1,0).  That  is  only  amount  of  S„  and  S22  are  1  and 
This  means  that  (According  to  the  proposed  method  in  this 
study)  assigning  individual  Vj  to  position  of  store 
management  and  individual  Y2  to  position  of  finance 
manager  were  appropriate  decision  With  regard  to  judgments 
of  two  fronts  of  assignment.  In  practice,  after  assigning  new 
managers  based  on  obtain  results.  The  satisfaction  survey  (by 
question  forms)  demonstrates  satisfaction  in  these  two 
departments  over  %75  increased  than  before  on  both  upper 
managers  and  employees’  levels. 


TABLE  vm.  The  values  of  X,s  on  criteria  in  view  of  Y2 


Cf 

cf 

cf 

cf 

x, 

0.7 

0.3 

0.2 

0.1 

1.2 

0.6 

1 

0.4 

TABLE  DC  The  VALUES  OF  X,S  ON  CRITERIA  IN  VIEW  OF  P3 


Cf 

cf 

cf 

x, 

0.4 

0.2 

0.3 

Jf2_ 

0.7 

1 

0.7 

CONCLUSIONS 

In  this  study  with  proposing  a  novel  viewpoint  on  the 
basis  of  existing  reciprocal  system  of  judgment  between  the 
alternatives  of  the  both  side  of  assignment,  the  objective 
would  be  to  maximize  the  assignment  efficiency  in  obtaining 
the  two  dimension  optimality  with  which  cost  and  profit  gets 
substituted  which  was  considered  in  standard  assignment 
problem,  and  for  this  purpose,  a  compound  index  was 
defined  for  measuring  the  function  of  each  possible 
assignment  in  problem  formulation.  Then  a  mathematical 
programming  model  was  proposed  for  problem  solution  and 
for  determining  the  assignment  with  maximum  efficiency  it 
was  transformed  to  a  classic  linear  programming  model. 
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PLEASE  DO  NOT  RETURN  YOUR  COMPLETED  FORM  TO  THE  ABOVE  ORGANIZATION.  RETURN  COMPLETED  FORM  TO  THE  CONTRACTING  OFFICER. 


1.«.  NAME  OF  CONTRACTOR/SUBCONTRACTOR 

c.  CONTRACT  NUMBER 

2.B.  NAME  OF  GOVERNMENT  PRIME  CONTRACTOR 

c.  CONTRACT  NUMBER 

3. 

TYPE  OF  REPORT  IX  onol 

Clark  Atlanta  University 

W91  INF- 12-2-0067 

a.  INTERIM 

X 

b.  FINAL 

b.  ADDRESS  (Include  ZIP  Code) 

223  James  P.  B  raw  ley  Drive,  SW 
Atlanta,  Georgia  30314 


d.  AWARD  DATE 

(YYYYMMOD) 

20120924 


b.  ADDRESS  {Include  ZIP  Coda J 


d.  AWARD  DATE 
( YYYYMMDO ) 


4.  REPORTING  PERIOD  (YYYYMMOD) 


».  FROM  20120924 


20150923 


SECTION  I  -  SUBJECT  INVENTIONS 


5.  "SUBJECT  INVENTIONS"  REQUIRED  TO  BE  REPORTED  BY  CONTRACTOR/SUBCONTRACTOR  (ft  -None.' so  state) 


NAME(S)  OF  INVENTORY!  * 

(Lost.  First.  Middle  Initio 1) 

a. 

TITLE  OF  INVENTIONS! 

b. 

DISCLOSURE  NUMBER. 
PATENT  APPLICATION 
SERIAL  NUMBER  OR 

PATENT  NUMBER 

e. 

ELECTION  TO  FILE 

PATENT  APPLICATIONS  (X) 
d. 

CONFIRMATORY  INSTRUMENT 

OR  ASSIGNMENT  FORWARDED 

TO  CONTRACTING  OFFICER  (X) 

a. 

(11  UNITED  STATES 

12)  FOREIGN 

(a)  YES 

(b)  NO 

(a)  YES 

(b)  NO 

(a)  YES 

Ibl  NO 

None 

None 

None 

f.  EMPLOYER  OF  INVENTORY  NOT  6MPLOYEO  BY  CONTRACTOR/SUBCONTRACTOR 

0.  ELECTED  FOREIGN  COUNTRIES  IN  WHICH  A  PATENT  APPLICATION  WILL  BE  FILED 

(1)  (a)  NAME  OF  INVENTOR  (Lost,  FUat,  Middle  Initial) 

None 

(2)  (a!  NAME  Of  INVENTOR  (Lost.  First,  Middle  Initio) ) 

(1)  TITLE  OF  INVENTION 

(2)  FOREIGN  COUNTRIES  OF  PATENT  APPLICATION 

(b)  NAME  OF  EMPLOYER 

None 

(b|  NAME  OF  EMPLOYER 

(c)  ADDRESS  OF  EMPLOYER  (Include  ZIP  Code ) 

(c!  ADDRESS  OF  EMPLOYER  (Include  ZIP  Code! 

SECTION  II  -  SUBCONTRACTS  f Containing  o  " Potent  Rights'"  clause ) 


6.  SUBCONTRACTS  AWARDED  BY  CONTRACTOR/SUBCONTRACTOR  (H  ’None.'  so  note) 


NAME  OF  SUBCONTRACT OR(S) 


ADDRESS  (Include  ZIP  Code ) 

b. 


SUBCONTRACT 

NUMBER(S) 


FAR  "PATENT  RIGHTS" 
d. 


(1)  CLAUSE 
NUMBER 


(2)  DATE 

f YYYYMM) 


DESCRIPTION  OF  WORK  TO  BE  PERFORMED 
UNDER  SUBCONTRACTS! 


SUBCONTRACT  DATES  ( YYYYMMOD ) 

». 


(II  AWARD 


(21  ESTIMATED 
COMPLETION 


None 


None 


None 


SECTION  III  -  CERTIFICATION 

7.  CERTIFICATION  OF  REPORT  BY  CONTRACTOR/SUBCONTRACTOR  INot  rcqulrod  It:  IX  «  .pptoprl.icll  SMALL  BUSINESS  or 

NONPROFIT  ORGANIZATION 

1  certify  that  the  reporting  party  has  procedures  for  prompt  identification  and  timely  disclosure  of  "Subject  Inventions,"  that  such  procedures  have  been  followed  and  that  all  "Subject 

Inventions"  have  been  reported. 

a.  NAME  OF  AUTHORIZED  CONTRACTOR/SUBCONTRACTOR 

OFFICIAL  (Lost,  First,  Middle  Initio! I 

ZJoh/JSOA),  Ca-Zo!  . 

b-  Asst.  V#  feseaKj), 

Sfton  so e. eJL  fyoi/ns.'rf  tSu.nl 

J5s)  4  t  nec.  k-kJo  a 

c.  SIGNATURE  f 

d.  DATE  SIGNED 

DD  FORM  882,  JUL  2005 


PREVIOUS  EDITION  IS  OBSOLETE, 


FuunFluw/AUubt}  Proiesalonal  6.0 


